thoughtt ...thinking aloud

Automatic LOD Selection using CNNs

In Computer Graphics, the building block for representing a geometry is by using triangle primitives. Triangles are efficient in terms of memory, but for constructing a realistic-looking object, we must put lots of triangles together. In realtime rendering, there has an upper limit for the amount of time spent to render each frame. More triangles mean more computation time, and this leads to a jerky scene. Speeding up the rendering process is an active area of research in the field of computer graphics and realtime rendering.

One easy way to make things faster is by using different models with different levels of details. When an object is far from a camera, it will occupy a small subset of pixels in the final image. Spending too much processing power for a task that only affects a small portion of the final image is pointless. This sounds like an optimization problem; Which model should our game use when an object is at distance of the camera?


CNNs to the Rescue

Sample Output
Sample Output

The whole point of switching the geometry of an object is to not CPU and GPU computation on objects that are tiny and hard to recognize. So, why instead of traditional metrics, we don’t use an Object Recognition network to decide when we should switch our geometry. Let’s formulate our problem: We have object O at the distance of the camera, and it will occupy pixels of the final frame. We have LODs for this object with a different number of triangles (First LOD has the highest details, and the -th LOD is the simplest). We also have an Object Detection network called ObjNN, which has trained on our objects (2D images of them).

We consider the transition between LOD level to LOD level as a good only if ObjNN outputs a higher accuracy for for the object O.


Not much difference in detection scores when the object is far away.

This method doesn’t work for the objects that are close to the camera. All LOD levels will yield a very high score for the True class. A solution to this to train a CNN based on Human preferences. That is, train a network to learn and replicate a player decision in different scenarios, rather than solely rely on the detection scores.


Same accuracy for objects near the camera. (Consider extra details such as Stop Sign in LOD 1)

Don’t Get Too Excited

The above formulation is rudimentary, but at least it gives you an idea of how to incorporate a neural network for similar tasks in computer games and graphics.

One important note to consider is here is the weird and unpredicted behavior of CNNs (and neural networks in general). In the image below, you can see object detection algorithms can be fooled with modification in angle and position of objects, as well as the wrong classification of objects that have not been seen before by our network. You can read more about this Here and Here.

When objects angles are problematic - DE:TR Algorithm
When objects angles are problematic - DE:TR Algorithm

Notes

Epic Games recently showcased a preview of Unreal Engine 5 with a technology called Nanite. We the help of Nanite, the limitation of the number of polygons and triangles is somehow lifted. Artists will be able to import their highly detailed geometries with hundreds of million polygons to the engine. I am not still sure how this technology works at the moment, but we will find out soon.

Reference

  1. Levels of Detail and Continuous Levels of Detail (CLOD) [Wikipedia]
  2. DE⫶TR: End-to-End Object Detection with Transformers [Blog] [Github]
  3. Low poly models are part of POLYGON - Town Pack
  4. A first look at Unreal Engine 5

Image Credits

  1. Cover Image taken from Hydro-gene, and upscaled with Topaz A.I Gigapixel tool.