Per-Object Motion Blur

Originally posted on 24/09/2012

A while back I published a tutorial describing a screen space technique for approximating motion blur in realtime. The effect was simplistic; it took into account the movement of a camera through the scene, but not the movement of individual objects in the scene. Here I'm going to describe a technique which addresses both types of motion. But let's begin with a brief recap:

A Brief Recap

Motion pictures are made up of a series of still images displayed in quick succession. Each image is captured by briefly opening a shutter to expose a piece of film/electronic sensor. If an object in the scene (or the camera itself) moves during this exposure, the result is blurred along the direction of motion, hence motion blur.

The previous tutorial dealt only with motion blur caused by camera movement, which is very simple and cheap to achieve, but ultimately less realistic than 'full' motion blur.

For full motion blur, the approach I'll describe here goes like this: render the velocity at every pixel to a velocity buffer, then subsequently use this to apply a post process directional blur at each pixel to the rendered scene. This isn't the only approach, but it's one of the simplest to implement and has been used effectively in a number of games.

Velocity Buffer

In order to calculate the velocity of a point moving through space we need at least two pieces of information:
  • where is the point right now (a)?
  • where was the point t seconds ago (b)?

Technically the velocity is (a - b) / t however for our purposes we don't need to use t, at least not when writing to the velocity buffer.

Since we'll be applying the blur as a post process in image space, we may as well calculate our velocities in image space. This means that our positions (a and b) should undergo the model-view-projection transformation, perspective divide and then a scale/bias. The result can be used to generate texture coordinates directly, as we'll see.

To actually generate the velocity buffer we render the geometry, transforming every vertex by both the current model-view-projection matrix as well as the previous model-view-projection matrix. In the vertex shader we do the following:

   uniform mat4 uModelViewProjectionMat;
   uniform mat4 uPrevModelViewProjectionMat;

   smooth out vec4 vPosition;
   smooth out vec4 vPrevPosition;

   void main(void) {
      vPosition = uModelViewProjectionMat * gl_Vertex;
      vPrevPosition = uPrevModelViewProjectionMat * gl_Vertex;

      gl_Position = vPosition;
And in the fragment shader:
   smooth in vec4 vPosition;
   smooth in vec4 vPrevPosition;

   out vec2 oVelocity;

   void main(void) {
      vec2 a = (vPosition.xy / vPosition.w) * 0.5 + 0.5;
      vec2 b = (vPrevPosition.xy / vPrevPosition.w) * 0.5 + 0.5;
      oVelocity = a - b;
You may be wondering why we can't just calculate velocity directly in the vertex shader and just pick up an interpolated velocity in the fragment shader. The reason is that, because of the perspective divide, the velocity is nonlinear. This can be a problem if polygons are clipped; the resulting interpolated velocity is incorrect for any given pixel:

For now, I'm assuming you've got a floating point texture handy to store the velocity result (e.g. GL_RG16F). I'll discuss velocity buffer formats and the associated precision implications later.

So at this stage we have a per-pixel, image space velocity incorporating both camera and object motion.


Now we have a snapshot of the per-pixel motion in the scene, as well as the rendered image that we're going to blur. If you're rendering HDR, the blur should (ideally) be done prior to tone mapping. Here are the beginnings of the blur shader:
   uniform sampler2D uTexInput; // texture we're blurring
   uniform sampler2D uTexVelocity; // velocity buffer
   uniform float uVelocityScale;

   out vec4 oResult;

   void main(void) {
      vec2 texelSize = 1.0 / vec2(textureSize(uTexInput, 0));
      vec2 screenTexCoords = gl_FragCoord.xy * texelSize;

      vec2 velocity = texture(uTexMotion, screenTexCoords).rg;
      velocity *= uVelocityScale;

   // blur code will go here...
Pretty straightforward so far. Notice that I generate the texture coordinates inside the fragment shader; you can use a varying, it doesn't make a difference. We will, however, be needing texelSize later on.

What's uVelocityScale? It's used to address the following problem: if the framerate is very high, velocity will be very small as the amount of motion in between frames will be low. Correspondingly, if the framerate is very low the motion between frames will be high and velocity will be much larger. This ties the blur size to the framerate, which is technically correct if you equate framrate with shutter speed, however is undesirable for realtime rendering where the framerate can vary. To fix it we need to cancel out the framerate:
   uVelocityScale = currentFps / targetFps;
Dividing by a 'target' framerate (shutter speed) seems to me to be an intuitive way of controlling how the motion blur looks; a high target framerate (high shutter speed) will result in less blur, a low target framerate (low shutter speed) will result in more blur, much like a real camera.

The next step is to work out how many samples we're going to take for the blur. Rather than used a fixed number of samples, we can improve performance by adapting the number of samples according to the velocity:
   float speed = length(velocity / texelSize);
   nSamples = clamp(int(speed), 1, MAX_SAMPLES);
By dividing velocity by texelSize we can get the speed in texels. This needs to be clamped: we want to take at least 1 sample but no more than MAX_SAMPLES.

Now for the actual blur itself:
   oResult = texture(uTexInput, screenTexCoords);
   for (int i = 1; i < nSamples; ++i) {
      vec2 offset = velocity * (float(i) / float(nSamples - 1) - 0.5);
      oResult += texture(uTexInput, screenTexCoords + offset);
   oResult /= float(nSamples);
Note that the sampling is centred around the current texture coordinate. This is in order to reduce the appearance of artefacts cause by discontinuities in the velocity map:

That's it! This is about as basic as it gets for this type of post process motion blur. It works, but it's far from perfect.

Far From Perfect

I'm going to spend the remainder of the tutorial talking about some issues along with potential solutions, as well as some of the limitations of this class of techniques.


The velocity map contains discontinuities which correspond with the silhouettes of the rendered geometry. These silhouettes transfer directly to the final result and are most noticeable when things are moving fast (i.e. when there's lots of blur).

One solution as outlined here is to do away with the velocity map and instead render all of the geometry a second time, stretching the geometry along the direction of motion in order to dilate each object's silhouette for rendering the blur.

Another approach is to perform dilation on the velocity buffer, either in a separate processing step or on the fly when performing the blur. This paper outlines such an approach.

Background Bleeding

Another problem occurs when a fast moving object is behind a slow moving or stationary object. Colour from the foreground object bleeds into the background:

A possible solution is to use the depth buffer, if available, to weight samples based on their relative depth. The weights need to be tweaked such that valid samples are not excluded.

Format & Precision

For the sake of simplicity I assumed a floating point texture for the velocity buffer, however the reality may be different, particularly for a deferred renderer where you might have to squeeze the velocity into as few as two bytes. Using an unsigned normalized texture format, writing to and reading from the velocity buffer requires a scale/bias:
// writing:
   oVelocity = (a - b) * 0.5 + 0.5;

// reading:
   vec2 velocity = texture(uTexMotion, screenTexCoords).rg * 2.0 - 1.0;
Using such a low precision velocity buffer causes some artifacts, most noticeably excess blur when the velocity is very small or zero.

The solution to this is to use the pow() function to control how precision in the velocity buffer is distributed. We want to increase precision for small velocities at the cost of worse precision for high velocities.

Writing/reading the velocity buffer now looks like this:
// writing:
   oVelocity = (a - b) * 0.5 + 0.5;
   oVelocity = pow(oVelocity, 3.0);

// reading:
   vec2 velocity = texture(uTexMotion, screenTexCoords).rg;
   velocity = pow(velocity, 1.0 / 3.0);
   velocity = velocity * 2.0 - 1.0;


Transparency presents similar difficulties with this technique as with deferred rendering: since the velocity buffer only contains information for the nearest pixels we can't correctly apply a post process blur when pixels at different depths all contribute to the result. In practice this results in 'background' pixels (whatever is visible through the transparent surface) to be blurred (or not blurred) incorrectly.

The simplest solution to this is to prevent transparent objects from writing to the velocity buffer. Whether this improves the result depends largely on the number of transparent objects in the scene.

Another idea might be to use blending when writing to the velocity buffer for transparent objects, using the transparent material's opacity to control the contribution to the velocity buffer. Theoretically this could produce an acceptable compromise although in practice it may not be possible depending on how the velocity buffer is set up.

A correct, but much more expensive approach would be to render and blur each transparent object separately and then recombine with the original image.


It's fairly cheap, it's very simple and it looks pretty good in a broad range of situations. Once you've successfully implemented this, however, I'd recommend stepping up to a more sophisticated approach as described here.

I've provided a demo implementation.


Shadow Map Allocation

Originally posted on 15/08/2011

When I first implemented shadow mapping I began by allocating a shadow map texture to each shadow casting light. As the number of shadow casting lights grew, however, I realized that this wasn't an adequate solution. Allocating a single shadow map per-light is a bad idea for three main reasons:
  1. Shadow maps have a definite memory cost, so in order to keep the texture memory requirement constant as more shadow casting lights are added, shadow map size would need to be reduced proportionally. This has an ultimate negative impact on shadow quality.
  2. Rendering a shadow map can be skipped if the associated light volume doesn't intersect the view frustum, therefore any texture memory allocated for the shadow maps which aren't rendered is wasted.
  3. Shadow lights whose influence on the final image is small (i.e. lights covering a smaller area or lights which are far away) require fewer shadow map texels to produce the same quality of shadow; rendering a fixed-size shadow map can therefore be both a waste of texture space and rendering time.
Issue #1 can simply be solved by allocating a fixed number of shadow maps up front, and using these as a shadow map 'pool', or by allocating a single shadow map texture and rendering to/reading from portions of it as if they were separate textures.

Issues #2 and #3 are related in that they affect the amount of shadow map space that's actually required on a per-frame basis. Shadow maps which don't need to be rendered don't require any space (obviously), shadow maps which do need to be rendered require different amounts of shadow map space, depending on how they influence the final image.

This all points the way to a solution in which the available shadow map space can be allocated from a shadow map 'pool' per-frame and per-light, based on a couple of criteria:
  1. how much space is actually available
  2. how much space each light requires to get good (or good enough) quality results
The first criteria is simple enough; I divide a single shadow texture up into a number of fixed-size subsections, like this:

So for a 20482 texture this gives me 2x10242, 6x5122 and 8x2562 individual shadow maps, for a maximum of 16 shadow casting lights. These are indexed in order according to their relative size, as shown in the diagram. Even though there is a hard limit on the number of shadow maps, the simplicity of this scheme makes it attractive.

The second criteria is a little more complex; for each light there needs to be a way of judging its 'importance' relative to the other shadow casting lights so that an appropriate shadow map can be assigned from the pool. This 'importance' metric needs to incorporate the radius and distance of a given light volume: angular diameter is perfect for this. The actual calculation of angular diameter is done using trig:
In practice the actual angular diameter isn't needed, since all we want to know is whether or not the angular diameter of one light's volume is bigger or smaller than another, so we can use a cheaper trig-less formula:
Once every frame, we calculate this 'importance' value for each visible light, then they are sorted into importance order and assigned a shadow map from the pool. The most important lights get the biggest, the least important get the smallest. Here's the whole process in action:

This technique works best if the lights are spread apart, otherwise the discrepancies in shadow quality become more obvious and 'popping' (as individual lights skip between shadow map resolutions) becomes more noticeable. The worst case is to have lots of nearby lights of similar size being allocated different shadow map resolutions; it can be very easy to spot which light is getting assigned the bigger shadow map.

Another drawback is when rendering from multiple POVs (e.g. for split-screen multiplayer). Since the importance metric is POV-dependant, the shadow maps may be valid for one view and not for another. You could use a separate shadow map pool per-view, or re-render all of the shadow maps prior to rendering each view.

On the plus side this technique makes it very easy to add lots of shadow casting lights to a scene without too badly denting the available texture resources. It also helps to maximize performance, since rendering time and texture space get spent in the places they're needed most. By using portions of a single shadow map, scaling the quality becomes as simple using a larger or smaller texture.

An additional idea would be to dynamically tessellate the main shadow map at runtime, based on the number of shadow lights and their importance. This may result in more popping, however, as the frequency of resolution changes for each light could be as high as once per frame.

The importance metric can also be used to determine how to filter a shadow map more efficiently (e.g. whether to spend time doing a multisampled/stochastic lookups).

Update (26/06/2012)

I've been asked a couple of times about how to go about using a single shadow texture in the way I've indicated here, so I thought I'd patch this blog post with the requested info. It's pretty simple and can be used any time you want to render to/render from a texture sub-region.
  1. The first step is writing to the texture; bind it to the framebuffer and set the viewport to render into the texture sub-region.
  2. When accessing the sub-region, scale/bias the texture coordinates as follows: scale = region size / texture size, bias = region offset / texture size.
The downside is that hardware texture filtering can cause texels to 'bleed' into the sub-regions if you're not careful. Edge-handling (wrap, repeat, etc.) needs to be performed manually in the shader. This isn't too much of a problem with shadow maps.

I recently had another idea (which I've not played around with yet - let me know if you try this out) to spread the cost of shadow map rendering across multiple frames. This could be achieved by incorporating each shadow map's age (or frames since rendered) into the metric, such that importance = radius / distance * (age + 1). Age gets incremented every frame until the shadow map gets rendered, in which case it gets reset to 0 (or to 1, in which case you can remove the '+1' from the importance calculation).

In theory this will work because, as the shadow map gets older, it gets more 'important' that rendering occurs. Whether the linear combination above will work well enough in practice is something to be tested; it may be that age needs to become the dominant term more quickly.

Integrating this temporal method with the above spatial method is made tricky by the fact that, in the temporal approach, shadow maps need to persist. Even if a shadow map wasn't updated in this frame we still need it for rendering (if the light is visible), so we can't allow it to be overwritten with another shadow map. Allowing a shadow map to be 'locked' may appear to solve this issue, however the circumstances under which a shadow map can be 'unlocked' aren't really clear: you can safely overwrite a shadow map if it isn't needed this frame. But what if it's needed next frame?


Gamma Correction Overview

Originally posted on 20/04/2012

Rendering in linear space is good because it is simple; lighting contributions sum, material reflectance values multiply. Everything in the linear world is simple and intuitive - inhabiting the linear world is a sure way of preventing your brain from squirting out through your nose.

If the output of our display monitors was linear then this would be the end of the story. Alas, this is not the case...

Non-linear Outputs

The graph below shows how the output intensity of a typical monitor looks (the orange line) compared to linear intensity (the blue line). A monitor's response curve sags away from linear such that a pixel with a linear intensity of 0.5 appears about one fifth as bright as a pixel with a linear intensity of 1 (not half as bright, as we might have expected).

The result of this sag is that any uncorrected linear output from the display will appear much darker than it should.
The solution is to 'pre-correct' the output intensity immediately prior to displaying it. Ideally for this we need some information about the response curve of the monitor in use. To this end we could provide a calibration option which allows users to select a gamma correction exponent that 'looks right' for their monitor. Or we could take the easy route and just assume an exponent of 2.2 (which is good enough for the majority of cases). However we choose the exponent, to pre-correct the output we simply raise to the power of 1/exponent (the green line on the graph below).

This effectively cancels out the display's response curve to maintain a linear relationship between different intensities in the output. Problem solved. Well, not quite...

Non-linear Inputs

It is highly likely that some of the inputs to our linear rendering will be textures and that those textures will have been created from non-linear photographs and/or manipulated to look 'right' on a non-linear monitor. Hence these input textures are themselves non-linear; they are innately 'pre-corrected' for the display which was used to create them. This actually turns out to be a good thing (especially if we're using an 8 bit-per-channel format) as it increases precision at lower intensities to which the human eye is more sensitive.

We can't use these non-linear textures directly as inputs to a linear shading function (e.g. lighting) - the results would simply be incorrect. Instead we need to linearize texels as they are fetched using the same method as above. This can be done manually in a shader or have the graphics driver do it automagically for us by using an sRGB format texture.

End of story? Not quite...


For a deferred renderer there is a pitfall which programmers should be aware of. If we linearize a non-linear input texture, then store the linear result in a g-buffer prior to the lighting stage we will lose all of the low-intensity precision benefits of having non-linear data in the first place. The result of this is just horrible - take a look at the low-intensity ends of the gradients in the left image below:
Clearly we need to delay the gamma correction of input textures right up until we need them to be linear. In practice this means writing non-linear texels to the g-buffer, then gamma correcting the g-buffer as it is read at the lighting stage. As before, the driver can do the work for us by using an sRGB format for the appropriate g-buffer targets, or correcting them manually.

What do I mean by 'appropriate'?

To Be (Linear), Or Not To Be (Linear)?

Which parts of the g-buffer require this treatment? It depends on the g-buffer organisation, but in general I'd say that any colour information (diffuse albedo/specular colour) should be treated as non-linear; it was probably prepared (pre-corrected) to 'look right' on non-linear display. Any geometric or other non-colour information (normals/material properties) should be treated as linear; they don't encode 'intensity' as colour textures do.

Think of this post as a sort of quick-reference card; for more in-depth information take a look at the following resources:

"The Importance of Being Linear" Larry Gritz/Eugene d'Eon, GPU Gems 3

"Uncharted 2: HDR Lighting" John Hable's must-read GDC presentation

Wikipedia's gamma correction entry (and donate some money to Wikipedia while you're at it)


"Good Enough" Volumetrics for Spotlights

Originally posted on 06/01/2012

Volumetric effects are one of the perennially tricky problems in realtime graphics. They effectively simulate the scattering of light through particles suspended in the air. Since these effects can enhance both the realism and aesthetic appearance of a rendered scene, it would be nice to have a method which can produce "good enough" results cheaply and simply. As the title implies, "good enough" is the main criteria here; we're not looking for absolute photorealism, just something that's passable which adds to the aesthetic or the mood of a scene without costing the Earth to render.

I'll be describing a volumetric effect for spot lights, although the same ideas will apply to other types lights with different volume geometries.


The volume affected by a spotlight is a cone, so that's what we'll use as the basis for the technique.
How you generate the cone is up to you, but it must have per-vertex normals (they'll make life easier later on), no duplicated vertices except at the cone's tip and no base. I've found that having plenty of height segments is good for the normal interpolation and well worth the extra triangles.

The basic idea is to render this cone in an additive blending pass with no face culling (we want to see the inside and outside of the cone together), with depth writes disabled but the depth test enabled. As the screenshot below shows, on its own this looks pretty terrible:


To begin to improve things we need to at least attenuate the effect along the length of the cone. This can be done per-fragment as a simple function of the distance from the cone's tip (d) and some maximum distance (dmax):

Already things are looking a lot better:

Soft Edges

The edges of the cone need to be softened somehow, and that's where the vertex normals come in. We can use the dot product of the view space normal (cnorm) with the view vector (the normalised fragment position, cpos) as a metric describing how how near to the edge of the cone the current fragment is.

Normalising the fragment position gives us a vector from the eye to the point on the cone (cpos) with which we're dealing. We take the absolute value of the result because the back faces of the cone will be pointing away but still need to contribute to the final result in the same was as the front faces. For added control over the edge attenuation it's useful to be able to raise the result to the power n.

Using per-vertex normals like this is simple, but requires that the cone geometry be set up such that there won't be any 'seams' in the normal data, hence my previous note about not having any duplicate vertices except at the cone's tip.

One issue with this method is that when inside the cone looking up towards the tip the normals will tend to be perpendicular to the view direction, resulting in a blank spot. This can be remedied by applying a separate glow sprite at the light source position.

Soft Intersections

As you can see in the previous screenshot there is a problem where the cone geometry intersects with other geometry in the scene, including the floor. Remedying this requires access to the depth buffer from within the shader. As the cone's fragments get closer to fragments already in the buffer (i.e. as the difference between the depth buffer value and the cone fragment's depth approaches 0) we want the result to 'fade out':

The result should be clamped in [0, 1]. The radius can be set to make the edges softer or harder, depending on the desired effect and the scale of the intersecting geometry compared with the cone's size.
This does produce a slightly unusual fogging effect around the cone's boundary, but to my eye it meets the "good enough" criteria.

Another issue is that the cone geometry can intersect with the camera's near clipping plane. This results in the effect 'popping' as the camera moves across the cone boundary. We can solve this in exactly the same way as for geometry intersections; as the cone fragment's depth approaches the near plane we fade out the result.

That's it!


Motion Blur Tutorial

Originally posted on 21/04/2011

What is motion blur?

Motion pictures are made up of a series of still images displayed in quick succession. These images are captured by briefly opening a shutter to expose a piece of film/electronic sensor to light (via a lens system), then closing the shutter and advancing the film/saving the data. Motion blur occurs when an object in the scene (or the camera itself) moves while the shutter is open during the exposure, causing the resulting image to streak along the direction of motion. It is an artifact which the image-viewing populous has grown so used to that its absence is conspicuous; adding it to a simulated image enhances the realism to a large degree.

Later we'll look at a screen space technique for simulating motion blur caused only by movement of the camera. Approaches to object motion blur are a little more complicated and worth a separate tutorial. First, though, let's examine a 'perfect' (full camera and object motion blur) solution which is very simple but not really efficient enough for realtime use.

Perfect solution

This is a naive approach which has the benefit of producing completely realistic full motion blur, incorporating both the camera movement and movement of the objects in the scene relative to the camera. The technique works like this: for each frame, render the scene multiple times at different temporal offsets, then blend together the results:

This technique is actually described in the red book (chapter 10). Unfortunately it requires that the basic framerate must be samples * framerate, which is either impossible or impractical for most realtime applications. And don't think about just using the previous samples frames - this will give you trippy trails (and nausea) but definitely not motion blur. So how do we go about doing it quick n' cheap?

Screen space to the rescue!

The idea is simple: each rendered pixel represents a point in the scene at the current frame. If we know where it was in the previous frame, we can apply a blur along a vector between the two points in screen space. This vector represents the size and direction of the motion of that point between the previous frame and the current one, hence we can use it to approximate the motion of a point during the intervening time, directly analogous to a single exposure in the real world.

The crux of this method is calculating a previous screen space position for each pixel. Since we're only going to implement motion blur caused by motion of the camera, this is very simple: each frame, store the camera's model-view-projection matrix so that in the next frame we'll have access to it. Since this is all done on the CPU the details will vary; I'll just assume that you can supply the following to the fragment shader: the previous model-view-projection matrix and the inverse of the current model-view matrix.

Computing the blur vector

In order to compute the blur vector we take the following steps within our fragment shader:
  1. Get the pixel's current view space position. There are a number of equally good methods for extracting this from an existing depth buffer, see Matt Pettineo's blog for a good overview. In the example shader I use a per-pixel ray to the far plane, multiplied by a per-pixel linear depth.
  2. From this, compute the pixel's current world space position using the inverse of the current model-view matrix.
  3. From this, compute the pixel's previous normalized device coordinates using the previous model-view-projection matrix and a perspective divide.
  4. Scale and bias the result to get texture coordinates.
  5. Our blur vector is the current pixel's texture coordinates minus the coordinates we just calculated
The eagle-eyed reader may have already spotted that this can be optimized, but for now we'll do it long-hand for the purposes of clarity. Here's the fragment program:
   uniform sampler2D uTexLinearDepth;

   uniform mat4 uInverseModelViewMat; // inverse model->view
   uniform mat4 uPrevModelViewProj; // previous model->view->projection

   noperspective in vec2 vTexcoord;
   noperspective in vec3 vViewRay; // for extracting current world space position
   void main() {
   // get current world space position:
      vec3 current = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
      current = uInverseModelViewMat * current;
   // get previous screen space position:
      vec4 previous = uPrevModelViewProj * vec4(current, 1.0); /= previous.w;
      previous.xy = previous.xy * 0.5 + 0.5;

      vec2 blurVec = previous.xy - vTexcoord;

Using the blur vector

So what do we do with this blur vector? We might try stepping for n samples along the vector, starting at previous.xy and ending at vTexcoord. However this produces ugly discontinuities in the effect:

To fix this we can center the blur vector on vTexcoord, thereby blurring across these velocity boundaries:
Here's the rest of the fragment program (uTexInput the texture we're blurring):
// perform blur:
   vec4 result = texture(uTexInput, vTexcoord);
   for (int i = 1; i < nSamples; ++i) {
   // get offset in range [-0.5, 0.5]:
      vec2 offset = blurVec * (float(i) / float(nSamples - 1) - 0.5);
   // sample & add to result:
      result += texture(uTexInput, vTexcoord + offset);
   result /= float(nSamples);

A sly problem

There is a potential issue around framerate: if it is very high our blur will be barely visible as the amount of motion between frames will be small, hence blurVec will be short. If the framerate is very low our blur will be exaggerated, as the amount of motion between frames will be high, hence blurVec will be long.

While this is physically realistic (higher fps = shorter exposure, lower fps = longer exposure) it might not be aesthetically desirable. This is especially true for variable-framerate games which need to maintain playability as the framerate drops without the entire image becoming a smear. At the other end of the problem, for displays with high refresh rates (or vsync disabled) the blur lengths end up being so short that the result will be pretty much unnoticeable. What we want in these situations is for each frame to look as though it was rendered at a particular framerate (which we'll call the 'target framerate') regardless of the actual framerate.

The solution is to scale blurVec according to the current actual fps; if the framerate goes up we increase the blur length, if it goes down we decrease the blur length. When I say "goes up" or "goes down" I mean "changes relative to the target framerate." This scale factor is easilly calculated:

   mblurScale = currentFps / targeFps

So if our target fps is 60 but the actual fps is 30, we halve our blur length. Remember that this is not physically realistic - we're fiddling the result in order to compensate for a variable framerate.


The simplest way to improve the performance of this method is to reduce the number of blur samples. I've found it looks okay down to about 8 samples, where 'banding' artifacts start to become apparent.

As I hinted before, computing the blur vector can be streamlined. Notice that, in the first part of the fragment shader, we did two matrix multiplications:
// get current world space position:
   vec3 current = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
   current = uInverseModelViewMat * current;
// get previous screen space position:
   vec4 previous = uPrevModelViewProj * vec4(current, 1.0); /= previous.w;
   previous.xy = previous.xy * 0.5 + 0.5;
These can be combined into a single transformation by constructing a current-to-previous matrix:

mat4 currentToPrevious = uPrevModelViewProj * uInverseModelViewMat

If we do this on the CPU we only have to do a single matrix multiplication per fragment in the shader. Also, this reduces the amount of data we upload to the GPU (always a good thing). The relevant part of the fragment program now looks like this:
   vec3 current = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
   vec4 previous = uCurrentToPreviousMat * vec4(current, 1.0); /= previous.w;
   previous.xy = previous.xy * 0.5 + 0.5;


Even this limited form of motion blur makes a big improvement to the appearance of a rendered scene; moving around looks generally smoother and more realistic. At lower framerates (~30fps) the effect produces a filmic appearance, hiding some of the temporal aliasing that makes rendering (and stop-motion animation) 'look fake'.

If that wasn't enough, head over to the object motion blur tutorial, otherwise have some links:

"Stupid OpenGL Shader Tricks" Simon Green, NVIDIA

"Motion Blur as a Post Processing Effect" Gilberto Rosado, GPU Gems 3



SSAO Tutorial

Originally posted on 05/01/2011


Ambient occlusion is an approximation of the amount by which a point on a surface is occluded by the surrounding geometry, which affects the accessibility of that point by incoming light. In effect, ambient occlusion techniques allow the simulation of proximity shadows - the soft shadows that you see in the corners of rooms and the narrow spaces between objects. Ambien occlusion is often subtle, but will dramatically improve the visual realism of a computer-generated scene:
The basic idea is to compute an occlusion factor for each point on a surface and incorporate this into the lighting model, usually by modulating the ambient term such that more occlusion = less light, less occlusion = more light. Computing the occlusion factor can be expensive; offline renderers typically do it by casting a large number of rays in a normal-oriented hemisphere to sample the occluding geometry around a point. In general this isn't practical for realtime rendering.

To achieve interactive frame rates, computing the occlusion factor needs to be optimized as far as possible. One option is to pre-calculate it, but this limits how dynamic a scene can be (the lights can move around, but the geometry can't).

Way back in 2007, Crytek implemented a realtime solution for Crysis, which quickly became the yardstick for game graphics. The idea is simple: use per-fragment depth information as an approximation of the scene geometry and calculate the occlusion factor in screen space. This means that the whole process can be done on the GPU, is 100% dynamic and completely independent of scene complexity. Here we'll take a quick look at how the Crysis method works, then look at some enhancements.

Crysis Method

Rather than cast rays in a hemisphere, Crysis samples the depth buffer at points derived from samples in a sphere:

This works in the following way:
  • project each sample point into screen space to get the coordinates into the depth buffer
  • sample the depth buffer
  • if the sample position is behind the sampled depth (i.e. inside geometry), it contributes to the occlusion factor
Clearly the quality of the result is directly proportional to the number of samples, which needs to be minimized in order to achieve decent performance. Reducing the number of samples, however, produces ugly 'banding' artifacts in the result. This problem is remedied by randomly rotating the sample kernel at each pixel, trading banding for high frequency noise which can be removed by blurring the result.
The Crysis method produces occlusion factors with a particular 'look' - because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.

Normal-oriented Hemisphere

Rather than sample a spherical kernel at each pixel, we can sample within a hemisphere, oriented along the surface normal at that pixel. This improves the look of the effect with the penalty of requiring per-fragment normal data. For a deferred renderer, however, this is probably already available, so the cost is minimal (especially when compared with the improved quality of the result).

Generating the Sample Kernel

The first step is to generate the sample kernel itself. The requirements are that
  • sample positions fall within the unit hemisphere
  • sample positions are more densely clustered towards the origin. This effectively attenuates the occlusion contribution according to distance from the kernel centre - samples closer to a point occlude it more than samples further away
Generating the hemisphere is easy:
for (int i = 0; i < kernelSize; ++i) {
   kernel[i] = vec3(
   random(-1.0f, 1.0f),
   random(-1.0f, 1.0f),
   random(0.0f, 1.0f)
This creates sample points on the surface of a hemisphere oriented along the z axis. The choice of orientation is arbitrary - it will only affect the way we reorient the kernel in the shader. The next step is to scale each of the sample positions to distribute them within the hemisphere. This is most simply done as:
   kernel[i] *= random(0.0f, 1.0f);
which will produce an evenly distributed set of points. What we actually want is for the distance from the origin to falloff as we generate more points, according to a curve like this:

We can use an accelerating interpolation function to achieve this:
   float scale = float(i) / float(kernelSize);
   scale = lerp(0.1f, 1.0f, scale * scale);
   kernel[i] *= scale;

Generating the Noise Texture

Next we need to generate a set of random values used to rotate the sample kernel, which will effectively increase the sample count and minimize the 'banding' artefacts mentioned previously.
for (int i = 0; i < noiseSize; ++i) {
   noise[i] = vec3(
      random(-1.0f, 1.0f),
      random(-1.0f, 1.0f),
Note that the z component is zero; since our kernel is oriented along the z-axis, we want the random rotation to occur around that axis.

These random values are stored in a texture and tiled over the screen. The tiling of the texture causes the orientation of the kernel to be repeated and introduces regularity into the result. By keeping the texture size small we can make this regularity occur at a high frequency, which can then be removed with a blur step that preserves the low-frequency detail of the image. Using a 4x4 texture and blur kernel produces excellent results at minimal cost. This is the same approach as used in Crysis.

The SSAO Shader

With all the prep work done, we come to the meat of the implementation: the shader itself. There are actually two passes: calculating the occlusion factor, then blurring the result.

Calculating the occlusion factor requires first obtaining the fragment's view space position and normal:
   vec3 origin = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
I reconstruct the view space position by combining the fragment's linear depth with the interpolated vViewRay. See Matt Pettineo's blog for a discussion of other methods for reconstructing position from depth. The important thing is that origin ends up being the fragment's view space position.
Retrieving the fragment's normal is a little more straightforward; the scale/bias and normalization steps are necessary unless you're using some high precision format to store the normals:
   vec3 normal = texture(uTexNormals, vTexcoord).xyz * 2.0 - 1.0;
   normal = normalize(normal);
Next we need to construct a change-of-basis matrix to reorient our sample kernel along the origin's normal. We can cunningly incorporate the random rotation here, as well:
   vec3 rvec = texture(uTexRandom, vTexcoord * uNoiseScale).xyz * 2.0 - 1.0;
   vec3 tangent = normalize(rvec - normal * dot(rvec, normal));
   vec3 bitangent = cross(normal, tangent);
   mat3 tbn = mat3(tangent, bitangent, normal);
The first line retrieves a random vector rvec from our noise texture. uNoiseScale is a vec2 which scales vTexcoord to tile the noise texture. So if our render target is 1024x768 and our noise texture is 4x4, uNoiseScale would be (1024 / 4, 768 / 4). (This can just be calculated once when initialising the noise texture and passed in as a uniform).

The next three lines use the Gram-Schmidt process to compute an orthogonal basis, incorporating our random rotation vector rvec.

The last line constructs the transformation matrix from our tangent, bitangent and normal vectors. The normal vector fills the z component of our matrix because that is the axis along which the base kernel is oriented.

Next we loop through the sample kernel (passed in as an array of vec3, uSampleKernel), sample the depth buffer and accumulate the occlusion factor:
float occlusion = 0.0;
for (int i = 0; i < uSampleKernelSize; ++i) {
// get sample position:
   vec3 sample = tbn * uSampleKernel[i];
   sample = sample * uRadius + origin;
// project sample position:
   vec4 offset = vec4(sample, 1.0);
   offset = uProjectionMat * offset;
   offset.xy /= offset.w;
   offset.xy = offset.xy * 0.5 + 0.5;
// get sample depth:
   float sampleDepth = texture(uTexLinearDepth, offset.xy).r;
// range check & accumulate:
   float rangeCheck= abs(origin.z - sampleDepth) < uRadius ? 1.0 : 0.0;
   occlusion += (sampleDepth <= sample.z ? 1.0 : 0.0) * rangeCheck;
Getting the view space sample position is simple; we multiply by our orientation matrix tbn, then scale the sample by uRadius (a nice artist-adjustable factor, passed in as a uniform) then add the fragment's view space position origin.
We now need to project sample (which is in view space) back into screen space to get the texture coordinates with which we sample the depth buffer. This step follows the usual process - multiply by the current projection matrix (uProjectionMat), perform w-divide then scale and bias to get our texture coordinate: offset.xy.

Next we read sampleDepth out of the depth buffer (uTexLinearDepth). If this is in front of the sample position, the sample is 'inside' geometry and contributes to occlusion. If sampleDepth is behind the sample position, the sample doesn't contribute to the occlusion factor. Introducing a rangeCheck helps to prevent erroneous occlusion between large depth discontinuities:

As you can see, rangeCheck works by zeroing any contribution from outside the sampling radius.

The final step is to normalize the occlusion factor and invert it, in order to produce a value that can be used to directly scale the light contribution.
 occlusion = 1.0 - (occlusion / uSampleKernelSize);

The Blur Shader

The blur shader is very simple: all we want to do is average a 4x4 rectangle around each pixel to remove the 4x4 noise pattern:
uniform sampler2D uTexInput;

uniform int uBlurSize = 4; // use size of noise texture

noperspective in vec2 vTexcoord; // input from vertex shader

out float fResult;

void main() {
   vec2 texelSize = 1.0 / vec2(textureSize(uInputTex, 0));
   float result = 0.0;
   vec2 hlim = vec2(float(-uBlurSize) * 0.5 + 0.5);
   for (int i = 0; i < uBlurSize; ++i) {
      for (int j = 0; j < uBlurSize; ++j) {
         vec2 offset = (hlim + vec2(float(x), float(y))) * texelSize;
         result += texture(uTexInput, vTexcoord + offset).r;
   fResult = result / float(uBlurSize * uBlurSize);
The only thing to note in this shader is uTexelSize, which allows us to accurately sample texel centres based on the resolution of the AO render target.


The normal-oriented hemisphere method produces a more realistic-looking than the basic Crysis method, without much extra cost, especially when implemented as part of a deferred renderer where the extra per-fragment data is readily available. It's pretty scalable, too - the main performance bottleneck is the size of the sample kernel, so you can either go for fewer samples or have a lower resolution AO target.

A demo implementation is available here.

The Wikipedia article on SSAO has a good set of external links and references for information on other techniques for achieving real time ambient occlusion.