Real-world data is indispensable when training object detection models for autonomous vehicle (AV) perception systems. Unfortunately, the real world doesn’t always easily provide all the data needed for successful training. For example, classes such as cyclists and motorcyclists may occur less frequently than pedestrians and cars, making it difficult for perception models trained on real-world data to correctly detect them (Figure 1). Similarly, the most dangerous situations such as accidents may be hidden in the last few percent of test drives.
Figure 1: Cyclists are often underrepresented in real-world datasets. This makes it difficult for a perception model trained only on real-world data to detect cyclists.
Even though underrepresented classes and long-tail events occur less frequently in the real world, object detection models still need to be trained to handle them just as well as more common classes and situations. In the past few years, perception teams have started to utilize synthetic data to help address some of these limitations in real-world datasets. There still exists a domain gap between real-world and synthetic data, which is important to acknowledge, but recent methods are overcoming this gap with a combination of improved synthetic data and new machine learning training strategies.
To demonstrate this use case, Applied Intuition’s perception team has conducted a case study that uses synthetic data generated by Spectral
as a supplemental training resource to address a class imbalance found in a real-world dataset. The study shows that synthetic data may be used to help mitigate class imbalances and address areas where real-world data is limited.
Sign up to read the full-length case study.
Goal and Scope
This case study uses nuImages
—a commonly used dataset by Motional—as a baseline training dataset. In the dataset, the cyclist class occurs 170 times less frequently than more prominent classes such as cars and pedestrians (Figure 2).
Figure 2: The class distribution for five of the classes contained in the nuImages training set used in this case study. People and cars occur more frequently (a total of ~90% of the five classes used in this study). Cyclists occur only 0.3% of the time.
The study generates and uses a synthetic dataset to improve the perception algorithm’s object detection performance on cyclists while retaining or improving object detection performance on other classes. It also explores whether the use of synthetic data can reduce the amount of real-world data needed to improve the model’s object detection performance.
The study consists of the following steps:
- Analyze a baseline model trained only on real-world data from the nuImages dataset.
- Generate labeled synthetic data that specifically targets the lack of representation of the cyclist class in the real-world dataset. More examples of cyclists are created in the synthetic dataset.
- Use the above synthetic data as a supplemental training resource in addition to the nuImages data.
1. Baseline model analysis
First, it is measured how a perception model reacts to a class imbalance in the real-world nuImages data. A Cascade Mask R-CNN perception model is trained on this dataset until convergence. Its resulting object detection performance is lower on the cyclist class compared to all other classes (Figure 3).
Figure 3: The object detection performance of the baseline perception algorithm when trained on the nuImages data. Aggregate performance (bounding box, segment) and per-class performance (car, truck, cyclist, motorcycle, person) is measured in mean average precision (mAP) scores (i.e., the measure of the accuracy of object detection) and reported as averages over 0:5:0.95 Intersection-over-Union (IoU) values (i.e., the measure of how much the predicted boundary overlaps with the ground truth).
2. Synthetic data generation
Next, synthetic data is generated in Applied’s perception simulation tool Spectral to upsample the underrepresented cyclist class (Figure 4). This case study uses procedural 3D environment generation, automatic scenario creation, and a synthetic data generation pipeline to enable this process.
Figure 4: Class distribution of real-world (nuImages) and synthetic datasets. Cyclists (yellow) are upsampled in the synthetic dataset (right).
Automatic scenario generation enables the creation of realistic distributions for all scenario parameters and the sampling from these distributions to achieve coverage of various scenarios. In the synthetic dataset used in this study, three different forms of “scenario” generation were used:
- Sequential scenario creation by defining individual actor behaviors
- Sequential scenario creation using traffic generators
- Non-sequential data frames using distributions and smart actors
Examples of each of these methods are shown in the following images, along with their ground truth data (Figure 5 a) - 5 c)).
Figure 5 a): Spectral synthetic images using sequential scenarios with per-actor definition.
Figure 5 b): Spectral synthetic images using sequential scenarios with per-actor definition.
Figure 5 c): Spectral synthetic images using non-sequential frames with randomized smart actors.