At this year’s virtual Automotive LIDAR conference, Applied Intuition gave a presentation about training perception models of advanced driver-assistance systems (ADAS) and autonomous vehicles (AVs) with synthetic lidar data. Our presentation explained the challenges of developing and validating perception algorithms such as lidar models and how autonomy programs can generate and use synthetic training data in addition to real lidar data to improve model performance while lowering costs and speeding up time to market. The following blog post summarizes key takeaways from our presentation for those who could not attend.
When developing and validating AV perception models, a machine learning (ML) model’s performance highly depends on the quality and quantity of available training data. Unfortunately, real-world data—which most autonomy programs use to train their perception algorithms—can be slow, expensive, and even dangerous to collect. Once collected, real-world data then needs to be labeled, which often is a slow and error-prone process.
To solve the challenges that real-world data collection and labeling impose on effective ML training, autonomy programs can leverage synthetic data to train lidar-based perception algorithms. This process follows three steps:
Synthetic data generation platforms help autonomy programs carry out step 1 of the mentioned process more easily. A synthetic data generation platform should be optimized for ML engineers and help them define and generate physically accurate synthetic data at scale. The platform should be able to generate synthetic data that is analogous to the real-world circumstances in which the model will be deployed. This means modeling the exact sensors used by the system, generating a variety of ground-truth labels, and even procedurally generating 3D worlds that look like the autonomy program’s domain (Figure 2).
Applied Intuition provides a synthetic data generation platform called Synthetic Datasets. Synthetic Datasets provide labeled synthetic data for ML training, including the training of synthetic lidar models. Additionally, our sensor simulation tool Spectral offers support for software-in-the-loop (SIL) and hardware-in-the-loop (HIL) testing of perception modules and end-to-end autonomy stacks.
To show the advantages of Synthetic Datasets, Applied has conducted several studies. First, our inference study demonstrates how optimizations to synthetic data generation, like ML-based sensor model tuning, reduce the simulation-to-real domain gap (i.e., a degradation in lidar performance due to a difference between the synthetic data used for training and the target domain in the real world) (Figure 3). Closing the simulation-to-real domain gap makes the synthetic data more impactful when used in training and testing.
Second, our training study proves that synthetic lidar data materially improves model performance, enabling developers to rely more on synthetic data and less on real-world data collection and labeling (Figure 4).
Contact our team to learn more about Synthetic Datasets, Spectral, and how they help autonomy programs train their perception algorithms more effectively.