Background
Ambient occlusion is an approximation of the amount by which a point on a surface is occluded by the surrounding geometry, which affects the accessibility of that point by incoming light. In effect, ambient occlusion techniques allow the simulation of proximity shadows - the soft shadows that you see in the corners of rooms and the narrow spaces between objects. Ambien occlusion is often subtle, but will dramatically improve the visual realism of a computer-generated scene:The basic idea is to compute an occlusion factor for each point on a surface and incorporate this into the lighting model, usually by modulating the ambient term such that more occlusion = less light, less occlusion = more light. Computing the occlusion factor can be expensive; offline renderers typically do it by casting a large number of rays in a normal-oriented hemisphere to sample the occluding geometry around a point. In general this isn't practical for realtime rendering.
To achieve interactive frame rates, computing the occlusion factor needs to be optimized as far as possible. One option is to pre-calculate it, but this limits how dynamic a scene can be (the lights can move around, but the geometry can't).
Way back in 2007, Crytek implemented a realtime solution for Crysis, which quickly became the yardstick for game graphics. The idea is simple: use per-fragment depth information as an approximation of the scene geometry and calculate the occlusion factor in screen space. This means that the whole process can be done on the GPU, is 100% dynamic and completely independent of scene complexity. Here we'll take a quick look at how the Crysis method works, then look at some enhancements.
Crysis Method
Rather than cast rays in a hemisphere, Crysis samples the depth buffer at points derived from samples in a sphere:This works in the following way:
- project each sample point into screen space to get the coordinates into the depth buffer
- sample the depth buffer
- if the sample position is behind the sampled depth (i.e. inside geometry), it contributes to the occlusion factor
The Crysis method produces occlusion factors with a particular 'look' - because the sample kernel is a sphere, flat walls end up looking grey because ~50% of the samples end up being inside the surrounding geometry. Concave corners darken as expected, but convex ones appear lighter since fewer samples fall inside geometry. Although these artifacts are visually acceptable, they produce a stylistic effect which strays somewhat from photorealism.
Normal-oriented Hemisphere
Rather than sample a spherical kernel at each pixel, we can sample within a hemisphere, oriented along the surface normal at that pixel. This improves the look of the effect with the penalty of requiring per-fragment normal data. For a deferred renderer, however, this is probably already available, so the cost is minimal (especially when compared with the improved quality of the result).Generating the Sample Kernel
The first step is to generate the sample kernel itself. The requirements are that- sample positions fall within the unit hemisphere
- sample positions are more densely clustered towards the origin. This effectively attenuates the occlusion contribution according to distance from the kernel centre - samples closer to a point occlude it more than samples further away
for (int i = 0; i < kernelSize; ++i) {
kernel[i] = vec3(
random(-1.0f, 1.0f),
random(-1.0f, 1.0f),
random(0.0f, 1.0f)
kernel[i].normalize();
}
kernel[i] *= random(0.0f, 1.0f);
We can use an accelerating interpolation function to achieve this:
float scale = float(i) / float(kernelSize); scale = lerp(0.1f, 1.0f, scale * scale); kernel[i] *= scale;
Generating the Noise Texture
Next we need to generate a set of random values used to rotate the sample kernel, which will effectively increase the sample count and minimize the 'banding' artefacts mentioned previously.for (int i = 0; i < noiseSize; ++i) {
noise[i] = vec3(
random(-1.0f, 1.0f),
random(-1.0f, 1.0f),
0.0f
);
noise[i].normalize();
}
These random values are stored in a texture and tiled over the screen. The tiling of the texture causes the orientation of the kernel to be repeated and introduces regularity into the result. By keeping the texture size small we can make this regularity occur at a high frequency, which can then be removed with a blur step that preserves the low-frequency detail of the image. Using a 4x4 texture and blur kernel produces excellent results at minimal cost. This is the same approach as used in Crysis.
The SSAO Shader
With all the prep work done, we come to the meat of the implementation: the shader itself. There are actually two passes: calculating the occlusion factor, then blurring the result.Calculating the occlusion factor requires first obtaining the fragment's view space position and normal:
vec3 origin = vViewRay * texture(uTexLinearDepth, vTexcoord).r;
vViewRay. See Matt Pettineo's blog for a discussion of other methods for reconstructing position from depth. The important thing is that origin ends up being the fragment's view space position.Retrieving the fragment's normal is a little more straightforward; the scale/bias and normalization steps are necessary unless you're using some high precision format to store the normals:
vec3 normal = texture(uTexNormals, vTexcoord).xyz * 2.0 - 1.0; normal = normalize(normal);
vec3 rvec = texture(uTexRandom, vTexcoord * uNoiseScale).xyz * 2.0 - 1.0; vec3 tangent = normalize(rvec - normal * dot(rvec, normal)); vec3 bitangent = cross(normal, tangent); mat3 tbn = mat3(tangent, bitangent, normal); mat3 tbn = mat3(tangent, bitangent, normal);
rvec from our noise texture. uNoiseScale is a vec2 which scales vTexcoord to tile the noise texture. So if our render target is 1024x768 and our noise texture is 4x4, uNoiseScale would be (1024 / 4, 768 / 4). (This can just be calculated once when initialising the noise texture and passed in as a uniform).The next three lines use the Gram-Schmidt process to compute an orthogonal basis, incorporating our random rotation vector
rvec.The last line constructs the transformation matrix from our
tangent, bitangent and normal vectors. The normal vector fills the z component of our matrix because that is the axis along which the base kernel is oriented.Next we loop through the sample kernel (passed in as an array of
vec3, uSampleKernel), sample the depth buffer and accumulate the occlusion factor:float occlusion = 0.0;
for (int i = 0; i < uSampleKernelSize; ++i) {
// get sample position:
vec3 sample = tbn * uSampleKernel[i];
sample = sample * uRadius + origin;
// project sample position:
vec4 offset = vec4(sample, 1.0);
offset = uProjectionMat * offset;
offset.xy /= offset.w;
offset.xy = offset.xy * 0.5 + 0.5;
// get sample depth:
float sampleDepth = texture(uTexLinearDepth, offset.xy).r;
// range check & accumulate:
float rangeCheck= abs(origin.z - sampleDepth) < uRadius ? 1.0 : 0.0;
occlusion += (sampleDepth <= sample.z ? 1.0 : 0.0) * rangeCheck;
}
tbn, then scale the sample by uRadius (a nice artist-adjustable factor, passed in as a uniform) then add the fragment's view space position origin.We now need to project
sample (which is in view space) back into screen space to get the texture coordinates with which we sample the depth buffer. This step follows the usual process - multiply by the current projection matrix (uProjectionMat), perform w-divide then scale and bias to get our texture coordinate: offset.xy.Next we read
sampleDepth out of the depth buffer (uTexLinearDepth). If this is in front of the sample position, the sample is 'inside' geometry and contributes to occlusion. If sampleDepth is behind the sample position, the sample doesn't contribute to the occlusion factor. Introducing a rangeCheck helps to prevent erroneous occlusion between large depth discontinuities:As you can see,
rangeCheck works by zeroing any contribution from outside the sampling radius.The final step is to normalize the occlusion factor and invert it, in order to produce a value that can be used to directly scale the light contribution.
occlusion = 1.0 - (occlusion / uSampleKernelSize);
The Blur Shader
The blur shader is very simple: all we want to do is average a 4x4 rectangle around each pixel to remove the 4x4 noise pattern:uniform sampler2D uTexInput;
uniform vec2 uTexelSize; // x = 1/res x, y = 1/res y
noperspective in vec2 vTexcoord; // input from vertex shader
out float oResult;
void main() {
float result = 0.0;
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
vec2 offset = vec2(uTexelSize.x * float(j), uTexelSize.y * float(i));
result += texture(uTexInput, vTexcoord + offset).r;
}
}
oResult = result / 16.0;
}
uTexelSize, which allows us to accurately sample texel centres based on the resolution of the AO render target.Conclusion
The normal-oriented hemisphere method produces a more realistic-looking than the basic Crysis method, without much extra cost, especially when implemented as part of a deferred renderer where the extra per-fragment data is readily available. It's pretty scalable, too - the main performance bottleneck is the size of the sample kernel, so you can either go for fewer samples or have a lower resolution AO target.A demo implementation is available here.
The Wikipedia article on SSAO has a good set of external links and references for information on other techniques for achieving real time ambient occlusion.







Unless I'm mistaken, doesn't give you this line values from -3.0 to 1.0? And since you don't normalize the vector before calculating the tangent, are you sure that the results are accurate?
ReplyDeleteOh man... what I wrote got all screwed up. I meant to refer to this line:
Deletevec3 rvec = texture(uTexRandom, vTexcoord * uNoiseScale).xyz * 2.0 - 1.0;
texture(uTexRandom, vTexcoord * uNoiseScale).xyz * 2.0 - 1.0;
DeleteuTexRandom is in an unsigned, normalized format, so the result of calling texture() is in [0,1], hence the scale/bias will move rvec to the range [-1,1].
rvec doesn't need to be normalized, tangent will still be perpendicular to normal, which is all that's required (tangent itself is normalized before being used to compute bitangent).
Hope that clears things up.
Hello John, I'm a big fan of yours :) I wish to create effects that you have but I lack the knowledge. There aren't many true beginner tutorials on the web, I struggle with learning graphics and I wanted to ask how did you learn these things and where do you recommend I look to be able to write code like this for myself. I've always wanted to implement my own SSAO or Global Illumination into Unity3D(the game engine I prefer) which uses Cg but its fairly similar to HLSL.
DeleteHaving general programming skills makes a good starting point, preferably in a procedural language like C (on which the shading languages are largely based).
DeleteMaths is essential of course, specifically linear algebra (vectors, matrices, etc.). It's not necessary to be a great mathematician (heaven knows I'm not), but having a practical understanding of what vectors and matrices represent and how they are used and manipulated is necessary. "Essential Mathematics for Games and Interactive Applications" by James Van Verth and Lars Bishop is the best tutorial/reference I've seen and taught me a lot of what I know. Also if you search "3d maths tutorial" there are a wealth of resources on the web to get started.
For graphics programming I think it's useful to start small; it's very important to understand the graphics pipeline, how geometry goes from being an abstract set of points in memory to being pixels in a framebuffer. There is a great resource for getting started at www.arcsynthesis.org/gltut/
The following thread on gamedev contains some more recommendations which may be useful:
http://www.gamedev.net/topic/621102-things-every-graphics-programmer-should-know/
Wow, thanks for all this useful info! Sorry for the late reply. I have been doing lots of research and this shall help me greatly. I'm already in Calculus learning more and we learned the true meaning of the dot product, turning vectors into a number. It's all starting to make sense. Thank you very much once again!
DeleteThank you so much for this tutorial. It's a pretty hard effect to implement, I used your tutorial for to add ssao to my own python molecular viewer https://github.com/chemlab/chemlab. What I've obtained so far is this: http://troll.ws/image/d9ab2364 (I have to add the blur step) it looks right to me but there are many mistakes that I could have made.
ReplyDeleteI would have never been able to implement such a cool looking effect without your tutorial and your code. I sincerely thank you.
I've added blur and have a very small problem.
DeleteIn this picture I've rendered with 128 samples a set of procedurally-generated sphere imposters, the problem I'm having is that around each sphere there is a thin halo of non-occlusion:
http://troll.ws/image/f57a6a71
Do you have any idea/suggestion about what causes this issue and how to solve this problem?
The shader I'm using are in this directory: https://github.com/chemlab/chemlab/tree/master/chemlab/graphics/postprocessing/shaders
This is the main issue with indiscriminately blurring the AO result. Areas of occlusion/non-occlusion will tend to 'leak' - most noticeably where there are sharp discontinuities in the depth buffer.
DeleteThe solution is to use a more complex blur which samples the depth buffer and only blurs pixels which are at a similar depth.
Another option might be to simply dilate the AO result slightly (after applying the blur). I'm not sure how well this will work, though.
Hi,
ReplyDeleteThanks for the great tutorial! I have a question for you:
What do I need in order to compute the vViewRay vector used for the unprojection?
The article you linked to says that vViewRay is a vector pointing towards the far-clipping plane - how would I obtain it? Would this be done in the vertex shader of the SSAO fullscreen quad pass or via some other means? Maybe you can share your method of obtaining it :)?
You can compute the view ray from the normalized device coordinates of the fragment in question and the field of view angle and aspect ratio of the camera, like this:
Deletefloat thfov = tan(fov / 2.0); // can do this on the CPU
viewray = vec3(
ndc.x * thfov * aspect,
ndc.y * thfov,
1.0
);
You can do this either in the vertex shader (and interpolate the view ray), or directly in the fragment shader (compute ndc as texcoords * 2.0 - 1.0).
Matt actually has another, more in-depth blog post on this topic, which may help clarify things better.
Thanks for the quick reply! I did calculate the view-space position of the fragment from depth by one of the methods provided by Matt. One more thing - could you take a look at the result and clarify whether it's supposed to look like that?
DeleteSo basically, if I calculate origin and run the shader wihtout any additional computations it looks like this on a torus-like shape
You can see the shape in the image and you can see the screen is divided into areas of different color.
Now if I were to zoom out this is how my screen looks like
So basically after un-projection the screen is divided into 4 areas. I remember reading somewhere that it's supposed to look like that, but I'd like to cross-reference it with someone who has achieved a successful result :). Did your vViewRay * texture(uTexLinearDepth, vTexcoord).r produce similar results for your scene ?
The first one looks fine, but the second is flipped on the x and y axis, which is probably because you camera's fov > 180 degrees.
DeleteHeh, I think if got flipped because I've zoomed out too much, because it looks correct while zooming out and only flips after a certain point.
DeleteAfter some tweaking I was able to achieve this result. It looks like the occlusion is being computed, at least partially. However there's a lot of noise. Also, the shape gets darker and darker the further I move away from it (not zoom out), as-in it works like an inverted depth result. Maybe you have an idea as to what can be causing these effects?
What is the size of your scene, compared to the radius of the sample kernel? They should be comparable - it looks like the sample kernel radius is too large. The noise is expected, although your noise texture looks larger than 4x4. The variation along the depth of the scene is probably due to incorrect depth reconstruction: the depth should be linear.
DeleteHm, my scene size is 1366 x 768. What would be a good radius for such a scene? By comparable do you mean close?
DeleteThe noise texture I've used was larger - it was 64x64. I've switched to 4x4. Upon switching to 4x4 noise and setting kernel radius to 1067 I got this result.
I think the depth that I use is linear - it's obtained via the method described here.
With new parameters, the result doesn't act like depth - it does not get brighter/darker upon moving away or towards it. However, it still looks off.
I think there still is a problem with my vViewRay vector - by aspect ratio in its computation, do you mean the relation 16 / 9 or something else?
Also, in the lookup of rvec you mention a noise scale parameter - that would be vec2(1366 / 4, 768 / 4) in this case right?
Sorry, by 'scene size' I meant the size of the objects in your scene (the torus), not the resolution. So if the torus size is ~1 unit, the sample kernel radius should be ~1. Aspect ratio should be the x resolution / y resolution, so 1366/768 in your case.
DeleteThanks for clearing things up!
DeleteAfter some more playing around I've been able to obtain this, which looks more like SSAO than the previous results. I was able to get this by changing occlusion += (sampleDepth <= sample.z ? 1.0 : 0.0) * rangeCheck; into occlusion += (sampleDepth <= offset.z ? 1.0 : 0.0) * rangeCheck;.
If I understand correctly, sampleDepth is the screen-space depth and sample.z is the view-space depth, so if I compare them directly does it produce a valid result? In previous attempts it didn't seem so. Or did my code change from sample.z to origin.z just produced a false nicer-looking illusion?
In my result I get a more tolerable level of noise compared to previous attempts and the depth isnt varying anymore. However, at the middle of the screen region there's a visible change in noise pattern - sort of like a white tear - do you know why does this occur?
Also, the result that I got is highly view-dependant - whenever I move my camera with the mouse the occlusion result jumps around with it - should this be the case?
The rangeCheck calculation in the tutorial is correct; you want to compare the depth of the current sample fragment with the depth of the kernel sample position in view space.
DeleteThe result you're getting still looks very wrong. It could be that the kernel radius is too large, or possibly that the normals are incorrect. It's hard to tell from the screenshot; a scene with a grid of objects on a big flat plane might be a better test.
There's also a sample implementation on the demos page which might be helpful.
Hm, I've taken a look at your demo and it seems I've used a non-linear depth in my shaders. I save my depth as described here, but it appears to lose its linearity once projected. However, if I try to linearize the depth by multiplying it with the inverse projection matrix and try to output the linearized depth to the screen I get a black screen - all zero values.
DeleteMaybe something's wrong with my projection matrix? Should the projection matrix be the same as the one I use in the first pass to output the normals and depth?
The projection matrix you use should be the same perspective projection as when rendering the scene.
ReplyDelete"it appears to lose its linearity once projected"
Are you sure? Are you writing the depth to a floating point texture? Is the depth you're outputting positive or negative?
I'm outputting the depth and normals in an RGBA texture - RGB for normals and A for depth. The depth that I put into this texture is multiplied by -1, otherwise it won't show up at all (I'm guessing that's what you mean by asking whether it's negative?).
DeleteThe texture type is GL_UNSIGNED_BYTE. The projection matrix indeed is the same one I've used for rendering the scene. So, I'm guessing there's something wrong with the way I output the depth to the texture or the texture itself.