How the Kaolin App can make turntable-style synthetic datasets of your USD files in minutes
Interest in 3D deep learning has been accelerating at a breakneck pace. With PyTorch3D, Kaolin, Open3D, an onslaught of papers from SIGGRAPH, CVPR, and ECCV, and industrial case studies emerging from Tesla, Amazon, and BMW to name a few, it is clear that the Spatial Computing revolution is already underway. In addition, the data-centric AI movement has sparked an interest in bootstrapping CV applications with simulation. Understanding how to process monoscopic cameras, stereoscopic cameras, 360 cameras, LiDAR, structured light scans, geospatial data, aerial photos, and oftentimes how to fuse them together is becoming an essential skill.
NVIDIA is constructing a massive and extensible collection of tools and applications for seamless, high-fidelity simulation and design inside of NVIDIA Omniverse. Omniverse is more than NVIDIA’s gamble on the Metaverse. They are poising it to become a core suite of tools for a future with an insatiable need for 3D content, pervading the growing markets of gaming, industrial visualization, visual effects, automation, spatial retail, simulation, and nearly every other industry under the sun.
They call it Omniverse because at the heart of their value proposition lies interoperability and collaboration. This is powered by Omniverse Connect, a set of extensions that allows various 3D applications to exchange data with Omniverse in real time. The list of Connectors includes Unreal Engine, 3DS Max, Maya, Revit and more every day. A key ingredient of this feature is heavy reliance on standards, particularly MDL, PhysX, and Pixar’s USD, which NVIDIA CEO Jensen Huang claims will become the “HTML for the Metaverse.”
Without a doubt, their value proposition for Omniverse as the single source of truth for the entire 3D asset ecosystem would be highly valuable if it can indeed accomplish that. NVIDIA is pouring tons of resources into realizing its potential. Only time will tell. For now, let us take a look at how Omniverse, specifically the Kaolin App, can be used today to generate synthetic data to develop your own computer vision applications.
Kaolin is NVIDIA’S high-level Python library for 3D deep learning built on top of PyTorch. It provides differentiable renderers, rasterization tools, helpers for manipulating meshes, vertices, faces, UV maps, and textures, common loss functions like Chamfer Distance and IoU, file IO, conversion for various 3D representations, graph convolutions, structured point clouds, and a lot more.
Not to be confused with the Kaolin library, the Kaolin App is a companion to the Kaolin library and part of the the Omniverse suite of applications. As of today, it consists of three primary features:
- Dataset Visualizer — view and inspect a collection of USD models in a 3D interface.
- Data Generator — create synthetic training data for computer vision applications from a collection of USD files. Includes annotators for segmentation, 2D & 3D bounding boxes, normals, point clouds, and more.
- Training Visualizer — view training output over time of meshes, point clouds, and other 3D data structures from deep learning sessions.
The high-level workflow for using the Kaolin App is this:
- Inspect your collection of 3D assets using the Dataset Visualizer.
- Generate a dataset using the Data Generator.
- Train a model on the dataset using the Kaolin library.
- Visualize training results in 3D with the Training Visualizer.
Hopefully it will all come together once I walk through an example. First of all, let us download some 3D assets to work with.
Lowe’s Open Builder
Now we have our assets, let’s use the Kaolin App to view them. The Dataset Visualizer can search a given directory for any .usd, .usda, or .usdc files and load them automatically into the viewer. Its primary purpose is to rapidly inspect collections of 3D assets to understand and identify possible issues with training.
At the moment, it does not support .usdz files, but there is an easy workaround. If you are familiar with the USD format, you probably know that a .usdz file is just an uncompressed archive that contains some form of USD file and its media assets (read more about the USDZ spec here). Therefore you can simply use 7zip,
unzip, or another archive utility to extract each .usdz file you downloaded. Go ahead and unzip each those .usdz files you downloaded into its own directory using an archive utility appropriate for your system. Inside you should see a .usdc file and some textures in a folder. Now opening the parent directory in the Dataset Visualizer should load those models successfully. It recursively searches the specified directory for any .usd, .usda, or .usdc files.
The following video will walk you through use of the Dataset Visualizer.
This app has just a handful of simple options. You can change the number of objects, rotate them, adjust the spacing, view them with sizes normalized or at normal scale, and adjust the up axis. Use this visualizer to see, for instance, if your models are the proper scale and orientation. If “normalize size” is unchecked, a bookcase and a vase should be greatly different in size. If it is checked, they should appear about the same size. If the assets do not look upright, check the up axis as well. You may need to adjust the “upAxis” property of the USD file. All of these assets should look fine with an up axis of Y.
Now that we know our assets are the right size and fully textured, we can generate some training data with them. The Data Generator tab is made for just that. This feature is made to generate turntable-style imagery of a collection of assets in rapid succession, with additional optional randomization to lighting, camera pose, and materials.
Currently the Data Generator supports the following label types:
- Semantic Segmentation
- Instance Segmentation
- 2D Bounding Boxes (tight or loose)
- 3D Bounding Boxes
- Point clouds
- Camera poses
The following video will walk you through use of the Data Generator.
With the Data Generator in Omniverse Kaolin, you can train machine learning models for classification, object detection, semantic/instance segmentation, depth estimation, 3D scene understanding, 3D reconstruction, and more. However, you might ask yourself: why might I want to generate training data of 3D models? While the domain gap makes using synthetic data for real-world applications a non-trivial task, there are more and more examples every day of synthetic data unlocking capabilities that were previously impossible or intractable to obtain real data for. Just look at recent efforts from Tesla, OpenAI, Amazon, and Meta (who recently acquired synthetic data startup AI.Reverie) to name a few. It can also be used to supercharge prototype development of computer vision applications without requiring any investment in hardware or labeling.
As the name implies, the Training Visualizer is useful for monitoring the training of ML models in real time. You might be wondering “isn’t that what Tensorboard is for?” While Tensorboard is indispensable for monitoring losses, weights, and output data logs, the Training Visualizer supports ray-traced rendering of your 3D data in real time. You can also swipe through training iterations, visualizing how the model topology and textures evolve throughout the training process.
The following video will walk you through use of the Training Visualizer.
At this point you have two options: you can either a) attempt to train a model on your data using the Kaolin library or b) download a training log that I created to try out the Training Visualizer right away.
Option A: Install Kaolin
To train a model, we will need to install the Kaolin library. If you are feeling adventurous, go ahead and follow the installation instructions and proceed to the DIB-R rasterizer tutorial. In short, this will deform a template mesh (in this case a sphere) to approximate the 3D model via optimization using only the input images, masks, and poses. You can replace the
rendered_path with the path to the training dataset you just created. When you are finished, open the output in
logs_path in the Training Visualizer.
Option B: Download the Training Log
If you would like to keep things simple, feel free to download this training session log to view the results on your own. Extract the folder from that file and then open it in the Training Visualizer (make sure to open the top-level folder because the app can be finicky about folder structure). You should see results like those in the video above.
Hopefully this helped you use Omniverse and the Kaolin App to generate labeled synthetic datasets of USD 3D models for 3D deep learning. It is a quick and efficient way to generate datasets for computer vision and 3D deep learning experiments. If you are looking to drive a more sophisticated simulation for ML training, you can also try the Omniverse Replicator SDK.
If all goes well, I will publish a follow-up article detailing how to use Kaolin Wisp, a newly-announced suite of tools built atop the Kaolin library specifically for neural rendering techniques, so that you can train a NeRF on this data that we created. Stay tuned!