Assignment 3: PathTracer

Jose Chavez

This is the first part of our Ray-Tracing project. In this half we focus on the basics of ray-tracing to advanced techniques for reducing the cost of rendering photo-realistic scenes. These techniques include constructing a BVH to greatly reduce intersection tests. Then, we use the BVH while estimating direct illumination with Monte-Carlo estimation and importance sampling. Since this isn't all of the light necessary, we calculate indirect illumination to give our scenes a more realistic feel. Finally, because all of that wasn't advanced enough, we use adaptive sampling to even further reduce our rendering. By the time I was done with this project, I started to realize how photo-realistic images in other settings were produced, such as in movies and video games. The CGI and video games we love wouldn't be possible without these accelerating techniques in ray-tracing.

Part 1: Ray Generation and Intersection

Like in project 1, we want to rasterize images on the screen while considering the effect of light placement in a 3D world. One possible way for us to see models with shadows and lighting might be to simulate a real light in the world, then just rasterize that image as if we were to take a picture of the scene with our perspective camera. However, this method would be very difficult, since we would have to trace light from a source which can have an uncountable number of light rays. Ray-tracing therefore does this in reverse. We generate a light ray that shoots out from our camera. We can then see where that ray hits, which could be a light source or another surface. We also do this process because not every ray from a light source is garunteed to hit the camera. Therefore, we concentrate on the rays that do hit the camera, per pixel, by generating and shooting a ray through a pixel into the 3D world and seeing what it intersects. This is also known as "backward tracing."

In our function "raytrace_pixel," we essentially estimate the incoming light from the world into a pixel by averaging "ns_aa" random generated ray samples. In order to shoot a ray through a pixel, we obtain a 2D vector from a uniform sampler between the values of [0, 1]. We then add those x-coordinate and y-coordinate values from the sample to the x and y coordinates of the pixel, finally normalizing it using the width and height of the pixels. What this does it gives us a random location in pixel space through which will cast a generated ray. If "ns_aa" represents the number of samples we take per pixel, then for each pixel, we generate "ns_aa" rays, add up the result of calling "est_radiance_global_illumination" through each ray, then dividing by "ns_aa" to obtain the average. "est_radiance_global_illumination" is a function that we will implement in a later part, but for now, it will return the default estimated global radiance for a pixel. If "ns_aa == 1", we simply cast a ray through the center of the pixel, "(x + 0.5) / sampleBuffer.w" and "(y + 0.5) / sampleBuffer.h."

However, we need the rays to be in world space, coming from a camera that is in "camera space." The camera has a sensor plane for which the ray needs to be casted to go from camera space to world space. We do this by using the horizontal and vertical field of view to calculate the coordinate space of the sensor plane given by

Vector3D(-tan(radians(hFov)*.5), -tan(radians(vFov)*.5),-1)

Vector3D( tan(radians(hFov)*.5),  tan(radians(vFov)*.5),-1)

where "hFov" is the horizontal field of view and "vFov" is the vertical field of view. The input sample point is then interpolated into this sensor plane to give us the direction of our ray in camera space. Finally, we convert this to world space using matrix operations. So far we have a random, generated ray that is shooting through a pixel from the camera's sensor plane, which is then converted to a ray in world space.

The next part after generating a ray through a pixel into the world is to see what primitive it intersects. For part 1, we deal with triangle and sphere primitives, each having different equations for determining if and where a ray intersects. Since we know that we can represent a ray by an origin "O", a direction "D", and time "t", we can see at what time "t" a ray intersects a triangle or a sphere. For a triangle, we use the Moller-Trumbore algorithm to determine the "t" and barycentric like coordinates that form the point where the ray intersects with the triangle. The Moller-Trumbore algorithms works to solve "b1", "b2", and "t" where

O + t*D = P = P0 * b0 + P1 * b1 + P2 * b2

where "P" is the intersection point interpolated using barycentric coordinates of "P1", "P2", and "P3", the vertices of the triangle. We solve for "t", "b1", and "b2" after rearranging the equation to be

O - P0 = -t * D + b1*(P1 - P0) + b2*(P2-P0)

finally converting it to matrix form

[-D     P1-P0     P2-P0] [t   b1    b2]^T = O-P0

Once in matrix form, we take the three different determants of "[ -D P1-P0 P2-P0 ]" replacing each column with "O-P0", finally dividing by the determinant of the whole.

If the "t" is negative, or it is beyond the maximum "t" of the ray, which is used to determine where to stop looking along an array, we do not consider the "t" as a intersect. Likewise, if the barycentric coordinates are not in between 0 and 1, we also don't consider the ray intersecting with the triangle. If these conditions satisfy, then we know that the "t" is valid, and we no longer have to look at any time further than "t", therefore we update the maximum "t" value for a ray.

For a sphere, we have to consider possibly two intersection points. For a point "p" we look for where

p : (p - c)^2 - R^2 = 0

Where "c" is the center of the sphere and "R" is the radius. We simply replace the point "p" with the ray equation, solving for "t", using the quadratic equation, which gives us two intersection times. Either they correspond to the same point, or points on the opposite sides of the sphere to be a valid intersection. If this is the case, then we also update the maximum "t" value of the ray to be the first time where it intersects the sphere, also updating information in an "Intersection" struct.

Finally, to put it all together, the intersection information is used in "est_radiance_global_illumination" to give the radiance where the ray intersects. We will come back to the implementation for "est_radiance_global_illumination", but for now it returns a default shading of an intersected primitive. Therefore, the images are not photo realistic. The reflecting colors look off, but we will improve the image with the following parts.

Below are the results of ray-tracing some small dae files with normal shading.


Coil
Spheres
Bench
Bunny



Part 2: Bounding Volume Hierarchy

What happens when our models contain an enormous amount of triangles and pixels, far and close to the camera? If we have ~1 million pixels and ~20 million triangles, say in a scene of a video game, then our current rasterizing technique will be really slow. How can we quickly generate samples and check the intersecting primitives? Some ideas shared in class were partioning space, reusing ray-object test for repeated primitives, run computation in parallel, segment objects into clusters and testing the clusters, and check intersection against bounding boxes. The idea of checking intersection based on bounding boxes leads to the technique of "Bounding Volume Hierarchy."

How can we use bounding volume to help accelerate ray-tracing? Well, what will save us a lot of ray-tracing time is not testing every primitive to intersection. This is because not every primitive is going to be seen by the camera, i.e. some are hidden behind over primities or too far beyond the camera. With this idea, a bounding volume is a tight box that represents the world extent of primitives inside it. If a ray intersects with that box, then we know it intersects with the primitives inside it. If not, then we do not have to test intersection with the primitives inside. This saves us a lot of ray-tracing time by only focusing on the primitives we know the camera can essentially see! All it takes is finding the bounding volume box and use a "ray-box" intersection test.

But, how do we determine the bounding box of primitives? We can't just bound an entire model, for that is useless and would force the intersection of everything inside it, which is already a lot! "Bounding Volume Hierarchy" suggest that we break down grids of primitives into bounding boxes in a binary search tree like way. If we have a model cow, and we know that the back of the cow will never be seen in the rendering, then we split a bounding box by the length of the cow. The parent bounding box is the bounding box of the whole cow, but we can make the front part of the cow a left bounding child and the back of the cow the right bounding child. We instead check the children, determining whether a ray is going to intersect the primitives inside. If we continue this construction, splitting the model into smaller and smaller halves of bounding boxes, then more accurately we determine the intersecting primitives. Thus, we resursively construct a binary tree of bounding boxes, our bounding volume hierarchy.

Given a starting vector of primitive pointers, what my algorithm did was split primitves by the midpoint along the longest axis in the 3D space, which is a naive approach. I calculated the longest axis, then split the primitives into "left" and "right" vectors based on where they fall according to the midpoint. However, I had to account for the case where after sorting the primitives, either the left or right vector remained empty! What this meant was the midpoint was a bad representative of where the primitives lie. To address this, I recursively moved the midpoint to be more towards the side of all the primitives, re-sorting the primitives. This was done until neither the left or right vectors were empty. With this, I passed in these vectors into a recursive call to construct the left and right of the tree. To note, this process only happened if the prims passed in surpassed the maximum number a node can hold. If the prims fell below this maximum, then all of the prims would be passed in to "node->prims", meaning the node was a leaf. Finally, we return "node". This ensures the recursive construction has a base case, building the tree in a divide and conquer fashion.

The results of rendering super large files increased dramatically. Before the BVH construction algorithm, the cow file took 3 minutes and 13 seconds to render.

Rendered cow without BVH

It took 0.0013 secs to construct the head BVH with all of the primitives. However, the renderer performed an average of 2932 intersection tests per ray! That is a lot! The figures below represent the bound with the BVH and the same rendering with the full construction of the BVH.


Head bounding box
Sub bounding boxes
More sub bounding boxes
Rendered cow with BVH

The renderer took longer to build the BVH, (0.0322 secs), but it only averaged 4 intersection tests per ray. What a huge decrease! That means that the overall number of primitive intersection tests went down as a result of computing and testing ray-box intersection with the BVH. Let's see some other large dae files.

Average of 2 intersection tests per ray
Average of 3 intersection tests per ray
Average of 6 intersection tests per ray

These would not have been possible without the construction a full BVH. We saw, that without the BVH, that the average number of intersection tests performed was really high. For example, another test showed that rendering the man took 1901 seconds, averaging 27160 intersection tests per ray. This was significantly reduced to 3! This is because the ray-box intersection test were only testing the main bounding box, containing all of the primitives. This means that there must have around 2932 primitives in the main bounding box. However, with an improved number of 4 intersection tests per ray, that means that some rays did not hit all of the bounding boxes in 3D space. This is what we would like to see since not every primitive/bounding volume is going to be visible to the camera. This also means that the leaf nodes must have contained an overall much smaller amount of primitives. From these experiments, the construction of a full BVH accelerates the ray-tracing of a scene because it reduces the number of primitives to be tested for intersection.




Part 3: Direct Illumination

While being able to rasterize images quickly is cool, we would still like photo-realistic images. In part 1, I mentioned that "est_radiance_global_illumination" only returned default shading for each primitive. As the name implies, "global illumination" is where we take into account all of the light in the scene, reflected and source, to render more photo-realistic images. Global illumination adds two sub portions of illumination: direct illumination and indirect illumination. This part focuses on the direct illumination.

Direct illumination deals with rays that go from a light source, hit a surface, then go straight to the camera. It essentially is the light that is only reflected from a surface once. With this, now we start to introduce terms such as radiance, irradiance, and BSDF. Radiance can be regarded as light or energy emitted. Irradiance can be regarded as the incoming light energy to a surface or point. What is BSDF? BSDF stands for "bidirectional scattering distribution function." It is generally regarded as the function that describes a way a light is scattered after hitting a certain surface. Different surfaces include diffuse, glass, and glossy. In this portion of the project, we are dealing with only diffuse objects, which have a BSDF function of (reflectance / PI). Thus the high level idea of direct illumination is generating a ray into a scene, calculating the irradiance from the environent at a hit point, using sample rays, finally representing the light going into a pixel.

We can think of calculating the irradiance from the environment onto a point as taking the integral of the incoming light in a hemisphere. Since computing the irradiance in a hemisphere usually requires double integrals, we turn to a estimation from a very handy Monte-Carlo Integration. Monte-Carlo Integration basically is going to help estimate our hemisphere integral based on random sampling. It is efficient for dealing with high-dimensional integrals and it is generally very simple. However, usually we need a lot of samples and usually it can generate noise. This is why in this part we both use Monte-Carlo and another method called importance sampling. I will now explain the Monte-Carlo estimation portion.

Estimating the direct irradiance in a hemisphere can be done using a Monte-Carlo estimator. This isn't the actual integral, but it will use the averaging of samples and a probability to give a unbiased estimation. Essentially, we are going to be generating "N" ray samples, in which their directions "wi" are obtained from a uniform hemisphere sampler, and we are going to send those rays out into the scene. If it intersects with the rest of the scene, then we retrieve the light emitted from the intersected primitive. Finally, the Lambertian cosine term and the original intersection's BSDF are multiplied to the emitted light found at the new intersection. In other words, for each random ray from our hit point in the hemisphere, we cast it out into the rest of the bvh, obtaining the radiance "L" of the ray from an intersecting primitive if there is one. This is how we estimate the irradiance. After summing up the radiance of each sample cast that intersected the scene, we take the average using our "N" and then also divide by our probability, which is (1 / 2 * PI). The probability represents the probability of sampling any ray in a hemisphere. The full equation is given below.

F_N = (2*PI / N) * SUM{L(hit_p, wi) * cosine * bsdf_factor : i = 1 to N}

Below are some rendered scenes during direct uniform hemisphere sampling.

Lambertian spheres with uniform hemisphere sampling


Bunny scene with uniform hemisphere sampling

However, as we can tell by the images, calculating direct lighting using uniform hemisphere sampling via Monte-Carlo estimation is pretty noisy. The overall reason for this is because we are sampling uniformly in the hemisphere, without any regards or focus on the actual light we want to calculate. Importance sampling, as the name suggests, samples the important directions of incoming light such as directly from light sources. It is a way of reducing the variance from the Monte-Carlo estimations, since it tries to concentrate on regions of the scene that would contribute the most towards the total incoming light. From these areas, more samples are taken. Mathematically, it also means we choose our probability density function wisely for each light source. In the Monte-Carlo estimation, we set the pdf to be (1 / 2* PI), which corresponds to the hemisphere. But if we calculate a the correct pdf, one that better, mathematically, represents the function of light coming in, we will reduce the variance. In other words, importance sampling does not use uniform sampling but instead chooses sampling more based on the integral of incoming light, what was originally supposed to be estimated by Monte-Carlo integration.

I will now walk through my implementation for importance sampling. Since we are focusing directly on sending more samples to the light sources in our scene, we start by directly iterating over all of the scene lights. Depending on if it is a delta light or not, we will send one sample to a delta light but "ns_area_light" to all other light. We do this because of the samples to a delta light would be the same; not true for the case of an area light. Then, we iterate over the number of samples we need to take. For example, we use our scene light to generate a sample ray with direction "wi" to the hip point "hit_p", where "wi" is in world coordinates. It is also generates a PDF variable for us to use that is going to better match the function of light coming in from the scene light. Using this information, we make a new intersection object, and send a new ray back in the direction "wi", testing to see if it intersects with the rest of the overall BVH. Since we are able to know whether the ray hit any part of our BVH and we know we sent it to the direction of the light sample we got, then we must have hit a light! Therefore, we use our light sample's radiance, the cosine term, and the BSDF factor, and add to our overall sum of irradiance, dividing by the PDF. This addition is different than in uniform hemisphere sampling, where the PDF was constant for each sample. However, since it is not, we need to divide each sample by it. The total radiance added up is then divided by the number of samples we took at the scene light. The results of rendering some scenes are below.

Dragon redering with importance sampling


Bunny scene with importance sampling


Spheres with importance sampling

These results are already less noisy than the direct hemisphere sampling. In other words, there is a lot less variance: more intermediate values rather than dark specks. Remember that "ns_area_light" variable? We can increase/descrease that value to get different results. We can predict, since it represents the number of samples being made towards a light source, that it will enhance the importance sampling result. Below are the toggling of 1, 4, 16, and 64 samples.


1 sample taken at area light source
4 samples taken at area light source
16 samples taken at area light source
64 samples taken at area light source

The results reveal the improvement importance sampling makes over uniform hemishpere sampling. The difference between sending 1 and 4 samples at an area light don't really show much of an improvment, but we can really start noticing the difference at 16 and 64 samples taken at an area light. The noise decreased pretty dramatically, in which the variance or number of black specks decreased. This must be the result of taking more samples at the light, which decreases the variance when we average. In sum, this is the point of importance sampling: it sends more samples at the light sources, which are the biggest contributers to the irradiance function that we discussed earlier. Changing the number of samples taken at the light sources wouldn't affect direct hemisphere sampling because it doesn't take samples specifically at light soures, but instead of the entire environment.




Part 4: Indirect Illumination and Global Illumination

Sure, the part 3 results are pretty cool but we are still seeing a lot of shadows in the bunny and sphere scenes. What about the light that hits multiple surfaces, adding to the overall ambient light? This light is called the indirect light, where it takes more than one bounce off a surface to hit the camera. How do we go about calculating this light? We have to go a little further than just casting rays into the scene, and now how to follow rays bouncing around many times in the scene. This is know as "path-tracing", where we follow the path of a ray, adding an intersection's emitted light to the overall radiance. The bigger question is, how far do we have to cast rays? How about a game of Russian Roulette?

We use a Russian Roulette like probility scheme where, after path-tracing, we decide to kill the ray. We can also just set a maximum number of bounces that we have to take for each ray. For this part, we used both conditions to make sure that, if the maximum number of bounces was greater than 1, we recursed at least once. Yup, you got it. We are going to use recursion! How this high level idea works is similar to like repeatly applying an operation to an emitted light value. We know that every time a ray intersects with a primitive, the overall radiance will decrease. Thus let "I" be the identity matrix and "K" be like a transformation value from outgoing radiance to incoming radiance at a surface. Let "L_e" be the radiance a ray and "L" be the sum of radiance from one ray.

L = (I + K + K^2 + K^3 + ...) * L_e

The exponent value on the "K" represents the number of bounces taken and operations added to the overall emitted light of the original ray. In sum, we use Russian Roulette probability to decide when to terminate the resursing on the ray, applying a "K^d" operation for each recursive level "d". This will not only give us the direct illumination, but also the indirect illumination. Finally, we add the zero bouncing light, which is light directly from our light source. All of these summed will produce the global illumination. I will now walk in more detail my implementation.

I am going to use three functions: "zero_bounce_radiance", "one_bounce_radiance", and "at_least_one_bounce_radiance". "zero_bounce_radiance" is the light coming straight from light source to the camera, since it doesn't hit any surface. "one_bounce_radiance" is basically just our direct illumination since it deals with light only hitting one surface before hitting the camera. "at_least_one_bounce_radiance" is the recursive procedure that first adds the result of "one_bounce_radiance", then proceeds to recurse, if needed. Before we cast any rays, we set the ray.depth of sample rays through a pixel to be "max_ray_depth", a fixed number for recursing for now. We also make a continuation probability value of 0.7, which means that if we were to flip a coin with probability 0.7 of being heads, heads would mean to continue recursing if r.depth > 1, where r is the current ray we are dealing with. We also recurse if our max_ray_depth is equal to r.depth and max_ray_depth is > 1, which will garuntee that we recurse at least one. If we are to recurse, then using a pdf, an incoming direction "w_in" generated by "isect.bsdf->sample_f()", we then construct a new ray using the "w_in" but converted to world coordinates, and we set its depth to be the current ray's depth minus 1, so that we may eventually stop if we reach the maximum bounce limit. Finally, if this ray intersects with the scene, we add our recursive call times the cosine and BSDF factor, divided by the PDF and by our continuation probability, cdf. Finally, in our global illumination function, we add the "zero_bounce_radiance" and the "at_least_one_bounce_radiance". Below are some results.

Spheres redering with global illumination


Bunny scene with global illumination

We can modify our implementation to actually show just the indirect light that is in our scene. Below I show a scene and a model dragon.

Sphere scene with only direct illumination


Indirect illumination in sphere scene
Dragon model with only direct illumination


Indirect illumination of dragon. Notice that it doesn't really contain a lot of light


Global illumination dragon. Notice that it is still brighter than with just direct illumination

We can see that the indirect lighting is more of an ambient lighting that fills in the shadows with realistic, reflected light. For the model, it filled in some of the hidden parts that were not directly in the light. Why not run experiments then on the "max_ray_depth" to see the difference if we increase this value. For this, we use the bunny scene.

A max_ray_depth of 0 means we only consider zero bouncing light
A max_ray_depth of 1 means we deal with the direct illumination
A max_ray_depth of 2 means we consider one level of indirect illumination. Notice how the overall scene ambient light increased.
A max_ray_depth of 3 means we consider a second level of indirect illumination. Notice how the ceiling and underside of the rabbit are brightened by the extra reflectance.
max_ray_depth = 100, the shadows underneath the rabbit are smoother

We can see that increasing the max_ray_depth adds more indirect light because it allows the calculations to include radiance that has been reduced after bouncing off many surfaces. However, for max_ray_depth = 100, we don't see a huge difference other than in the shadows of the bunny. This is because more of the scene is illuminated with 2 and 3 levels of illumination. If we were to add some different shading, such as glass or glossy, we could greatly need and see an improvement with more depth. We can also run some experiments on changing the samples per pixel, using 4 light rays.


1 sample per pixel produces extremely noisy results
2 samples per pixel, no drastic changes yet
4 samples per pixel, still a lot noise
8 samples per pixel, starting to see slight improvement
16 samples per pixel, definite improvement visible


64 samples per pixel, drastic improvement


1024 samples per pixel, high quality improvement

We can see that the overall lighting in the scene is the same, but the samples per pixel really makes a difference in smoothing out the variance across the pixels. We only start to see real improvment by the time we use 16 samples per pixel, with 64 and 1024 providing the best results. However, these were costly renders! Do all images need such a high number of samples per pixel to look good? Wouldn't all images be virtually slow? We will explore this in the adaptive sampling part.

Part 5: Adaptive Sampling

Part 5 deals with a different form of sampling rays through a pixel, called adaptive sampling. We have seen that Monte-Carlo estimation produced very noisy results, and we saw that importance sampling helped a lot by taking more samples at the light. It turns out that we don't always have to take so many samples at each pixel if the radiance of the pixel is going to "converge" at a very low sampling rate. What does converge mean? We can think of it as approaching a very little or no noisy radiance. In other words, some pixels don't need so many samples to be a lot less noisy, so why make all pixels suffer of high sample counts? In adaptive sampling, if some pixels need more samples to have the noisiness reduced, we let those pixels have high samples, and we let other pixels have lower sampling rates.

To test for convergance, we can use a confidence interval after obtaining the average illuminance "mu" and the stardard deviation "sd" of summed illuminance taken at "n" samples per pixel. We then calculate "I" given by the equation

I = 1.96 * (sd / sqrt(n))

which is our measure of the pixel's convergance. The 1.96 comes from using a 95% confidence interval. In other words, we can be sure with a 95% confidence interval that the pixel's average illuminance will be between "mu - I" and "mu + I". A pixel will converge if

I <= maxTolerance * mu

However, calculating this at each sample would be very costly. So we can check a pixel's convergance every "samplesPerBatch" value which we provide. What this means is every "samplesPerBatch" we check the convergance of the pixel. If it satisfies the equation above, then we can stop taking samples for any more samples would be unnecessary according to adaptive sampling. Therefore we stop and return the average radiance with a number of sample representing the actual number of samples taken, which could be shorter than "ns_aa" if the pixel converges before then. This should save us some computing time!

As for more implementation details, the only thing that was modified was our "raytrace_pixel" function to include an "actual_samples" variable and doubles "s1" and "s2". We only increment "actual_samples" if we take a sample from "generate_ray" to garuntee that it has the correct number of samples. We need "s1" and "s2" because "s1" is the sum of illuminance of the radiance spectrum returned by "est_radiance_global_illumination(ray)". "s2" is the sum of the this illuminance squared. We use "s1" and "s2" to calculate the mean "mu" and the variance "sd^2".

mu = s1 / actual_samples

sd^2 = (1 / actual_samples - 1) * (s2 - (s1^2 / actual_samples))

Finally, to fully display the effects of adaptive sampling has on the sampling rate of pixels, I filled in "sampleCountBuffer[pixel_y * frameBuffer.width + pixel_x]" with the average number of samples taken. The results are shown below with the bunny.

Bunny with adaptive samples shows no immediate differences


Sampling rate visual shows the differences in sampling rates taken at different areas of the scene


The red coloring indicates more high sampling at that area. Meanwhile, blue coloring represents low sampling at that area. Notice that the areas with the high sampling were the shadows of the bunny and the ceiling. These would need a lot of samples to really smooth out the different lighting and shadows, proof that we are probabily saving computing time using adaptive sampling with global illumination.