I'm Morgan McGuire (@CasualEffects). I've been working on computer graphics and games for 20 years at great places including NVIDIA, University of Waterloo, Williams College, Brown University, Roblox, Unity, and Activision.

See my home page for a full index of my blog posts, books, research, and projects.

Friday, August 16, 2013

Plausible Environment Lighting in Two Lines of Code

a group post by McGuire, Evangelakos, Wilcox, Donow, and Mara
(Because of intense interest in this post, we archived it as a technical report at http://graphics.cs.williams.edu/papers/EnvMipReport2013/ suitable for scientific citation)


This post describes a trick used in the G3D Innovation Engine 9.00 to produce reasonable real-time environment lighting for glossy surfaces. It specifically adds two lines of code to a pixel shader to reasonably approximate Lambertian and glossy reflection (under a normalized Phong model) of a standard cube map environment. This trick resulted from an afternoon discussion/challenge last week in the Williams Computational Graphics lab between Morgan McGuire (that's me), my summer 2013 research students, Dan Evangelakos (Williams), James Wilcox (NVIDIA and University of Washington), Sam Donow (Williams), and my NVIDIA colleague and former student Michael Mara.

There are many more sophisticated ways of computing environment lighting for real-time rendering (see Real-Time Rendering's survey or a more informal Google search). Unfortunately, those techniques tend to require an offline asset build step, so they can't be directly applied to dynamic light probes, which are particularly interesting to us...although there are some clever things that one can do with spherical harmonics to reduce the cost of that precomputation a lot. Those methods also tend to require a specific setup for the renderer that doesn't work with our default shaders. For research and hobbyist rendering, one often has to work with off-the-shelf or legacy assets. This situation also arises in production more than one might think, especially when working with games that run across a variety of devices.

A series of increasingly shiny teapots, before and after applying this technique.
In the before row, only the perfect-mirror (rightmost) teapot reflects the environment.



While this is a trick that probably won't interest next-generation console developers or the research community, we're posting it for two reasons.  First, it may be useful to a number of mobile game developers, students, and hobbyists, because it is a lot better than having no approximation of environment lighting at all. After all, it improved our images with one hour of math and one minute of coding (about the right ratio in general!), and we can save others the one hour of math. Although, frankly, this trick may also be useful to more "sophisticated" productions and tools; Maya uses simple environment MIP maps for lighting and doesn't even use our approximation to conserve energy or match glossy highlights.

Second, this is a nice example of how back-of-the-envelope computations and a principled understanding of the physics of rendering can very quickly yield practical improvements in image quality. That's the approach that my (other) coauthors and I advocate in Computer Graphics: Principles and Practice, and the one that I teach and use daily in my own work. The difference between a "bad hack" and a "good hack" (or, "principled approximation," if you prefer) is that the good hack is inherently robust and has known sources of errors because it arises from intentional simplification instead of fickle serendipity.

Background

Area lights and strong indirect
reflectors (from Digital Camera World)
One reason that classic 3D computer graphics images (which includes nearly every video game until a few years ago) look...computery...is that they underestimate the amount of light on a surface. In the real world, light arrives at a surface from many directions. The brightest light usually comes directly from a light source such as the sun or a bulb. But there is other "fill" or "bounce" light arriving from all directions, such as the light that shines through a window and then reflects off the floor and walls. If there wasn't, then areas in "shadow" would be completely black, and thus invisible.  Those sources of light are also area sources in the real world, that aren't just a single direction from which light arrives but a wedge around each surface point.

Area lights and strong indirect
reflectors (from Digital Camera World)
In attractive photographs, these effects are amplified to soften illumination and counteract the limited dynamic range of print and computer displays. That's why a studio photographer uses diffusers to create large rectangular light sources and places white reflectors around a model, as advocated in the Digital Camera World home studio articles whose images are shown on the right.


Harsh CG point lighting (from
Computer Graphics: Principles and Practice 3e
sample chapter on rasterization and ray tracing)
Classic computer graphics relies primarily on point light sources that cast harsh illumination from a single direction, plus some ambient terms.  The offline rendering community adopted much more realistic illumination models decades ago, which is why CG images in well-produced films are now indistinguishable from true live action shots. Those methods tend to have poor computational efficiency and few guarantees on even that long run time, so they are not directly applicable for real-time rendering. However, in the last ten years there has been a lot of research on comparable effects for real-time rendering. This is why the best-looking new games (e.g., Battlefield 4 and The Last of Us) now have some form of attractive lighting on dynamic objects.

The simplest technique for lighting a dynamic object is environment lighting. In this case, a cube map or other parameterization of the sphere stores the amount of light arriving from each direction, which is just a picture of the world around the object. Ideally, this cube map would be updated every frame to reflect changes in that lighting environment.  However, it works surprisingly well even if the cube map is static, if there is a single cube map used throughout the entire scene despite the view changing, or even if the cube map doesn't look terribly much like the actual scene. The application of maps of indirect light for mirror reflection (environment mapping) was first introduced by Jim Blinn in 1976.

The first use of environment reflection in a feature
film: Flight of the Navigator (1985)
Modern environment lighting was introduced by Gene Miller and Robert Hoffman in 1984 (see brief history of environment lighting). Shading is the convolution of a surface's reflection function with the lighting environment.  The reflection function is driven by surface chemistry and surface roughness. For example, glass produces a sharp highlight; sand-blasted glass produces a broad, dull highlight; and [dry] clay produces a broad, dull highlight always. Feel a high-gloss painted surface (like a window frame) and a so-called flat one (like a bedroom wall). The high-gloss surface is smoother. See http://www.insomniacgames.com/paper-exercise-an-efficient-representation-for-irradiance-environment-maps/ for a classic research paper on irradiance mapping and a modern game developer's presentation of it.

The convolution produces a convenient duality that a rough, matte surface under a point light looks
Skyshop's preconvolved environment maps
about the same as a smooth, shiny surface under very diffuse lighting
. This means that one can apply mirror environment mapping to a blurred cube map to approximate the appearance of a glossy surface. Computing that blur is a little tricky because we have to blur using the surface's reflectance function, not just any blur, and because there are some problems with not blurring over the horizon (i.e., tangent plane) of the surface. See Real-Time Rendering, 3rd edition for a full discussion. Some tools like AMD CubemapGen and Skyshop will compute this blur offline for various levels of roughness and store them in the MIP-levels of a cube map (think: array of increasingly blurry spheres).

Key Idea

Our goal is to approximate the convolution of an environment map with a cosine-power lobe parameterized on exponent $s$ using a conventional cube map MIP chain.  For Lambertian terms we'll estimate shading with a cube map sample from the lowest MIP (approximating cosine to the first power) about the normal, and for glossy terms we'll light with a cube map sample dictated by glossy exponent sampled about the reflection vector. We just need a formula for the MIP level to sample as a function of the glossy exponent. That's what the rest of this post is about.

Limitations

We ignore the fact that the tangent plane masks the reflection at glancing angles, a common problem even in the good convolution techniques. Since the conventional MIP chain uses simple $2\times 2$ averaging at each level and only within faces, and we then use (seamless) bilinear cubemap sampling, our final samples will represent convolution with a pyramid-density distorted quad instead of a power-cosine hemisphere. This is a source of error. It will be minimized near the center of faces and at high exponents (shiny surfaces) and worst at low exponents and corners.  If the scene on average is about as bright (the lowest frequencies are nearly constant) in all directions despite lots of high frequency detail (such as in Sponza), then the approximation will work well.  If the scene has highly varying illumination as a function of angle (such as in the Cornell Box), then everything will look slightly shinier than it should be because the lowest MIP levels do not actually blur across cube map faces. These approximation errors could be minimized by taking multiple texture samples from the cube map...but if we cared about it that much, we would probably use a better method than sampling MIP maps.  So we're going to accept these limitations and ask what is the best we can reasonably do with two texture fetches (Lambertian + glossy) from a conventional MIP map.

Derivation

The uniform weighted spherical cap of angle $\phi$ from center to edge subtends a solid angle of

$$2\pi \int_0^{\phi} \sin \theta~d\theta = 2\pi(1 - \cos \phi).$$

This is just the integral of 1 over the cap; the sin term arises from the spherical parameterization in which it is easiest to take the integral. A power-cosine ($\cos^s \theta$) weighted hemisphere, such as the one arising in the Blinn-Phong factor $(\hat{\omega}_\mathrm{h} \cdot \hat{\omega}_\mathrm{i})^s$, subtends a weighted solid angle with measure

$$\Gamma = 2\pi \int_0^{\pi/2} \cos^s \theta \sin \theta ~ d\theta $$

(recall that we're ignoring the tangent plane clipping by not clamping that Blinn-Phong term against zero).  Evaluating this integral shows us that in a completely uniform scene, the glossy reflection will have magnitude proportional to the solid angle

$$\Gamma = \frac{2\pi}{s + 1}.$$

A cube map contains six $w \times w$ (where $w$ is "width"; the faces are square) faces and covers the whole sphere of $4\pi$ steradians. MIP level $m$ contains $6 (w 2^{-m})^2$ pixels. A cube map texel at level $m$ thus subtends

$$T = \frac{4\pi}{6 (w ~ 2^{-m})^2} $$

steradians on average, with the ones at the corners covering less and the ones at the centers of faces covering more. We just need to know what MIP-level $m$ approximates integrating the same solid angle as the cosine exponent $s$, so we let $\Gamma = T$ and solve for $m$.

$$\frac{2\pi}{s + 1} =\frac{4\pi}{6(w ~ 2^{-m})^2}$$
$$\frac{2^{2m}}{3w^2} = \frac{1}{s + 1}$$
$$2^{2m} = \frac{3w^2}{s + 1}$$
$$2m = \log\left(\frac{3w^2}{s+1}\right)$$
$$2m = \log(3w^2) + \log\left(\frac{1}{s+1}\right)$$
$$m = \frac{1}{2}\log(3w^2) + \frac{1}{2}\log\left(\frac{1}{s+1}\right)$$
$$m = \log(w\sqrt{3}) + \frac{1}{2}\log \left(\frac{1}{s + 1}\right)$$
$$m = \log(w\sqrt{3}) - \frac{1}{2}\log(s + 1)$$

All logarithms are base 2 in this derivation. The $\log(w\sqrt{3})$ term depends solely on the cube map's MIP 0 resolution and can be precomputed, so we need only evaluate one logarithm and one fused multiply-add instruction per pixel. The logarithm is obviously the more expensive part. The cost of one logarithm in our pixel shader is insignificant compared to the other shading math, but one could store $\log(s + 1)$ in addition to $s$ in the roughness texture map if it was a major concern.

We're now ready for the two lines of code:


float MIPlevel = 
     log2(environmentMapWidth * sqrt(3)) - 
     0.5 * log2(glossyExponent + 1);
#if (__VERSION__ >= 400) || defined(GL_ARB_texture_query_lod)
   MIPlevel = max(MIPlevel, textureQueryLod(environmentMap, worldSpaceReflectionVector).y);
#endif

gl_FragColor.rgb = 
    lambertianCoefficient * textureCubeLod(environmentMap, 
                                       worldSpaceNormal, 
                                       maxMIPLevel).rgb +
    glossyCoefficient     * textureCubeLod(environmentMap, 
                                       worldSpaceReflectionVector, 
                                       MIPlevel).rgb;
How one computes the Lambertian and glossy coefficients depends on the preferred shading model. We use the energy-conserving normalized Blinn-Phong with a Fresnel term, as described in Real-Time Rendering and the Graphics Codex. Note that this environment illumination superimposes on the direct illumination, and should produce highlights from light sources in the environment map that are similar to those of direct light sources, as shown in the teapot image at the top of this post. We also use ambient obscurance to modulate the glossy environment reflection for low exponents, since the entire hemisphere should not be integrated in those cases. We're including our full material model and GLSL shader in the version 9.00 release of the G3D Innovation Engine set for release at the end of August.


Artifacts from our approximations are most visible on completely smooth geometry such as these spheres reflecting the Uffizi. The cube face MIP boundaries become somewhat apparent on the mid-range glossy values at the center bottom of the grid.


Artifacts from our approximations are least visible on complex, textured geometry, such as this glossy bump-mapped crate reflecting a relatively homogeneous alien planet skybox.



Morgan McGuire is a professor of Computer Science at Williams College, visiting professor at NVIDIA Research, and a professional game developer. He is the author of The Graphics Codex, an essential reference for computer graphics that runs on iPhone, iPad, and iPod Touch.