| Path Tracer Output | Denoised |
|---|---|
![]() |
![]() |
University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Richard Chen
- Tested on: Windows 11, i7-10875H @ 2.3GHz 16GB, RTX 2060 MAX-Q 6GB (PC)
Path tracing is slow and computationally expensive. Additional passes also provide diminishing returns with regard to image quality. This motivates the use of smoothing filters to make images look less noisy. This can speed up program execution by requiring fewer rendering passes and then smoothing out the noise from the resultant images.
A naïve smoother would be to blur neighboring pixels but this approach does not account for various geometries in a scene and can result in a loss of clarity as borders between different shapes become more homogenized.
In this project, I use GPU side geometry buffers (G-Buffers) to store scene relevant data in order to implement a smoothing filter that better preserves the boundaries between different geometries. This approach is based off of the paper, "Edge-Avoiding A-Trous Wavelet Transform for fast Global Illumination Filtering" by Dammertz, Sewtz, Hanika, and Lensch. The difference is that CUDA is used in place of GLSL fragment shaders for the computations.
- Edge aware A-Trous Smoothing
- G-Buffer Visualization
- Materials: The filter works best with diffuse materials. A glass or metal sphere might have neighboring pixels mapping to different parts of a given scene while maintaining similar position and normal coordinates. This illustrates that a G-Buffer of first bounce position/normal/color does not provide sufficient information for the A-Trous filter to avoid reflected edges. Even in the denoised image, the metal ball's reflections are blurred as to the filter, they are part of the same shape and thus can be smoothed.
- Lighting: If too much of the image is dark, the filter will darken the image. It is best to have a well illuminated scene, whether that means bigger lights or more renders before filtering
- Simpler scenes and those with large diffuse areas are more suitable for a wide filter and large color weights but more intricate scenes will need more localized smoothing especially if the scene is supposed to be high frequency from a signals processing perspective
| Image | Samples | Filter Size | Color | Normal | Position |
|---|---|---|---|---|---|
![]() |
10 | N/A | N/A | N/A | N/A |
![]() |
10 | 3 | 8.866 | 1.34 | 0.206 |
![]() |
10 | 20 | 8.866 | 1.34 | 0.206 |
![]() |
10 | 100 | 9.639 | 0.567 | 1.082 |
![]() |
100 | N/A | N/A | N/A | N/A |
![]() |
100 | 100 | 10 | 10 | 10 |
- For some reason, even with all things set to max, with 100 spp, the filter smoothing is barely noticeable (side walls are smoother, shadows less pronounced)
The filter operates on the rendered image output, basically post processing the existing image except it is paired with the G-Buffer of relevant per pixel data. What this means is that the filtering speed should have no dependence on the geometries that comprise the scene. For a given resolution, regardless of the polygon count of the scene or the number of passes taken, the smoothing step should take a similar amount of time.
The filter size vs time scaling seems to be linear which is not
expected as the number of iterations should be the log of the filter size but each iteration
reads the same number of pixels.

As expected, the time to denoise a scene
is uncoupled from the geometry of the scene itself. Ceiling and Cornell are simple
while wahoo and cow have lots of triangles in the mesh.

When the resolution per side doubles, the overall image pixels quadruples.
This chart shows the O(n^2) relationship

Denoising reduces the iterations needed to get an acceptably smooth result. However, how large of a reduction is a complicated question. With simple geometry like cornell ceiling light, 10 iterations is all that is needed especially if every surface is diffuse. The filter had trouble smoothing over 100 spp mario while smoothing even glass and metal at 10 spp. It takes much tweaking to find a good filter configuration however, and there is no good one size fits all solution. For some more complex scenes especially those involving meshes and textures, sometimes there seems to be no large differences. For the ebon hawk, at 10 spp, the filter blended everything together into a rather ugly color still with spots of noise while at 50spp, I have a similar effect with the wahoo at 100spp: there is barely any discernable difference, and in this case, the background streaks change slightly.
University of Pennsylvania, CIS 565: GPU Programming and Architecture, Project 3
- Richard Chen
- Tested on: Windows 11, i7-10875H @ 2.3GHz 16GB, RTX 2060 MAX-Q 6GB (PC)
Path tracing is a rendering technique where light rays are shot out from the "camera" into the scene. Whenever it meets a surface, we track how the ray gets attenuated and scattered. This allows for more accurate rendering at the cost of requiring vast amounts of computation. Fortunately, since photons do not (ignoring relativity) interact with each other, this is very parallelizable, a perfect fit for running on a GPU.
| Cornell Box Inspired | Very Fast Shiny Cow |
|---|---|
![]() |
![]() |
- Diffuse surfaces
Since most surfaces are not microscopically smooth, incoming light can leave in any direction.

- Specular reflection
Smooth surfaces reflect light neatly about the surface normal, like a mirror does.

- Dielectrics with Schlick's Approximation and Snell's Law
Light moves at different speeds through different mediums and this can cause light to refract and/or reflect. In these examples, glass and air are used with indices of refractions of 1.5 and 1, respectively. The further the incoming light is from the surface normal, the more likely it is to reflect.

- Anti Aliasing via Stochastic Sampling
As opposed to classical antialiasing which involves super-sampling an image and is thus very computationally expensive, stochastic sampling wiggles the outgoing ray directions slightly. This reduces the jagged artifacts from aliasing at the cost of more noise, but does not involve shooting extra photons per pixel. Notice how the left edge of the sphere is not nearly as jagged in the anti-aliased version

- Depth of Field/Defocus Blur
Despite modelling the rays as shooting out from an infinitesimal point, real life cameras have a lens through which the light passes. Further, the laws of physics also prevent light from being infinitely focused. With cameras, this means that objects further away from the focal length will be blurrier. In ray tracing, the origin points of the light rays are wiggled in a manner consistent with approximating a lens.

- Obj Mesh Loading
While cubes and spheres are a great point to start off, one of the great joys in life is to render Mario T-Posing. Many 3d models are available from the internet, with most of them being meshes composed of triangles. I used tinyObj to load models that were of the Wavefront OBJ file format.

- Textures from files
While it is theoretically possible to specify material properties for each shape in a scene, this can be untenable when working with thousands of shapes, let alone millions. Instead, it is common to use textures, images where the color encodes useful data. Then, rather than giving every vertex all of its data, it can associate them with texture coordinates and look up the corresponding data only when relevant. I focused on textures that encoded base color, tangent space normal mapping, ambient occlusion/roughness/metallicity, and emissivity. I also set the background in a few renders to a texture rather than just having it fade to black, lest they be way too dark

- Normal Mapping Texture Adaptation
The normal vector at a location allows for computing reflections, refractions, and more since knowing it allows one to calculate the angle of incidence. Technically, it is a co-vector but a consensus has been reached for how the order of vertices in a triangle directs its planar normal. At its most basic, each triangle contains enough information to calculate its normal. However, meshes composed of polygons are often used to model smooth objects, so it is common to associate each vertex with a normal vector. Then, for a point inside a triangle, one can interpolate between the vertex normals to get a normal for a particular location. Imagine a brick wall. The mortar crevices could be modelled by adding who knows how many new triangles. Alternatively, by turning the surface normals into a texture, they can be sampled as needed without weighing down the system with extra computations. Bump maps and height maps accomplish something very similar, but normal maps themselves come in two varieties: Object space and Tangent space. Object space maps let one directly sample the rgb components and associate them with the normal's xyz values. Tangent space normal maps involve a perspective shift so that the interpolated norm is pointing straight up. This requires some extra computation but is generally preferred due to its flexibility. The change of basis matrix TBN requires the namesake tangent, bitangent, and normal of which the normal is just the triangles planar norm. The other two can be relatively easily computed from the uv/texture coordinates of the vertices. To save on computation, I precompute them when loading in a mesh rather than need to recompute them every time they need to check the normal map.

- Physically Based Rendering Texture Adaptation
| Just the base color | With more effects |
|---|---|
![]() |
![]() |
Nowadays, many people use metallic/roughness and Albedo instead of diffuse/specular. I found a mesh (and its accompanying textures) that used this information so I had to figure out how to adapt to this. Due to vastly different behaviors between dielectrics and conductors, metallicity the concept is treated almost as a Boolean value, with gradations encoding how to attenuate the metallic behavior. Physically based rendering tries to use more physics to enable more realistic rendering. Light is split into refractive and reflective components and metals will absorb the refractive component whilst dielectrics will scatter both, with the resultant having both a specular and a diffuse portion. The roughness also has a varying effect predicated upon metallicity. And lastly there is an ambient occlusion map that describes how an area might be darker than expected. This seems to be more tailored towards rasterization as the nature of path tracing means areas which would be occluded more just will not bounce the light rays back towards light sources. The theory goes much deeper, having had whole textbooks written about them but just scratching the surface let me translate the textures and make the model look cool.
As a Star Wars fan, my thoughts naturally drifted towards making the ship look like it was in hyperspace. This is also motivated by the fact that with path tracing, I could not think of how to simulate ambient lighting, and was afraid I would need to sprinkle many small luminescent orbs around the scene just to be able to see anything. Once, I found a cool background, I was next concerned with how to map the rectangular texture onto the viewable sphere. By exploiting the symmetry of the hyperspace effect looking like a tunnel, this was made easy but for more complex backgrounds, this would require more investigation. As it currently stands, for the unit sphere, x^2+y^2+z^2 = 1. Z is constrained by the x and y values enabling us to map the view direction vector to UV coordinates. When z = 1, x = 0, y = 0 which maps to uvs of (0.5, 0.5). By extension, this should scoop a unit circle out of the texture that points would actually map to, with (x, y, z) and (x, y, -z) mapping to the same uv coordinates ensuring a smooth transition. Then, I rotated the mesh to align with the tunnel direction and voila, it looks like it is cruising through hyperspace.
The next issue is that the sky was too bright and too blue, since I was using the bright color for global illumination as well. So I interpolated the texture color between
black and itself based on its brightness, as I wanted the bright streaks to remain but the space between
them to be darker. Then I did it again since it worked well the first time.
- The scene used was a Cornell box with different colored walls, many spheres and cubes that were shiny or glass, and Mario T-Posing menacingly in the back with an emissive texture
- Caching the First Intersection
- The rays start out at a known location and shoot into a screen pixel and then into the scene so it makes sense to precompute the data on the first intersection for future computations
- Since antialiasing and depth of field are cool effects that add randomness and break this optimization, this optimization is worthless
- Sorting Rays by Material
- When there are multiple materials, checking the rays in order introduces branch divergence since materials will interact with the rays differently. Instead sort the rays by material so that there will be less divergence
- Meshes, which are expensive to check, count as a singular material so this optimization is situationally helpful if the scene in question has many different kinds of materials
- Compact Dead Threads
- If a ray terminates early, remove it from the pool so that there are fewer rays to check
- Especially when the geometry of the scene is very open, this optimization is very beneficial
- Mesh Intersection Test via AABB
- Rather than check collisions against every triangle, associate each mesh with its min and max bounds for an axis aligned bounding box (AABB) and only check intersection with triangles if it intersects the bounds
- Especially if a mesh fits tightly inside a box, this optimization is very helpful. But if the mesh is irregularly shaped enough that the AABB encompasses the whole scene anyway, it would be less useful.
- The Ebon Hawk model has more than 69,000 triangles and fits relatively neatly into its AABB so this is extremely useful
| Texture | Normal | Depth |
|---|---|---|
![]() |
![]() |
![]() |
| Bug | Cause |
|---|---|
![]() |
Bad RNG Seeding |
![]() |
Checking the explicit error condition t == -1 but forgetting to also eliminate the general t < 0 bad case |
![]() |
Triangle Intersection was wrong |
- Heavily optimize the performance with a special focus on reducing branch divergence
- Refactoring the code to be more structured and less haphazard
- Changing the option toggles from
#definemacros to Booleans so changing does not require lengthy recompilation - Dive deeper into PBR to make everything look cooler like making the coppery parts shinier in a realistic way that is not just sidestepping the material behaviors
- Learn about the Disney BSDF and the GGX equation
- How to interpolate normals from a tangent space normal map
- Refactor with separate position, normal, uv, index, etc. buffers rather than cramming everything into triangle
- use gltf to load instead of obj
- mikktspace algorithm
- Support for multiple mesh importing
- Special thanks to lemonaden for creating a free, high quality mesh of the Ebon Hawk https://sketchfab.com/3d-models/ebon-hawk-7f7cd2b43ed64a4ba628b1bb5398d838
- Ray Tracing in One Weekend
- IQ's list of intersector examples and the Scenes & Ray Intersection slides from UCSD's CSE168 course by Steve Rotenberg that helped me understand how Möller–Trumbore worked
- UT Austin's CS384 slides on normal mapping tangent that explained the theory on how to convert from tangent space normals to object space and https://stackoverflow.com/questions/5255806/how-to-calculate-tangent-and-binormal for explaining the calculations in a way that did not seem like abuse of matrix notation
- https://wallpaperaccess.com/star-wars-hyperspace for the cool hyperspace wallpaper
- Adobe's articles on the PBR Metallic/Roughness workflow that explained the theory behind it
- reddit user u/cowpowered for tips on performing normal interpolation when working with normal maps and tbns
uniform sampler2D colorMap, normalMap, posMap;
uniform float c_phi, n_phi, p_phi, stepwidth;
uniform float kernel[25];
uniform vec2 offset[25];
void main(void)
{
vec4 sum = vec4(0.0);
vec2 step = vec2(1./512., 1./512.); // resolution
vec4 cval = texture2D(colorMap, gl_TexCoord[0].st);
vec4 nval = texture2D(normalMap, gl_TexCoord[0].st);
vec4 pval = texture2D(posMap, gl_TexCoord[0].st);
float cum_w = 0.0;
for(int i = 0; i < 25; i++)
{
vec2 uv = gl_TexCoord[0].st + offset[i]*step*stepwidth;
vec4 ctmp = texture2D(colorMap, uv);
vec4 t = cval - ctmp;
float dist2 = dot(t,t);
float c_w = min(exp(-(dist2)/c_phi), 1.0);
vec4 ntmp = texture2D(normalMap, uv);
t = nval - ntmp;
dist2 = max(dot(t,t)/(stepwidth*stepwidth),0.0);
float n_w = min(exp(-(dist2)/n_phi), 1.0);
vec4 ptmp = texture2D(posMap, uv);
t = pval - ptmp;
dist2 = dot(t,t);
float p_w = min(exp(-(dist2)/p_phi),1.0);
float weight = c_w * n_w * p_w;
sum += ctmp * weight * kernel[i];
cum_w += weight*kernel[i];
}
gl_FragData[0] = sum/cum_w;
}































