Training and validating new deep neural networks, such as those used in ADAS / AD development, require large datasets along with significant IT infrastructure include compute power, networking, and storage. The right infrastructure is crucial with safety-critical system development where accuracy requirements standards are much higher than in other industries. These advanced algorithms must operate even within complex circumstances like varying weather conditions, visibility, and road surface quality.
Key challenges of the DL training workload for ADAS are:
- Explosive Data Growth – A typical vehicle used for sensor data collection in the ADAS system test use case includes multiple sensors such as LiDAR, RADAR, ultrasonic, GPS, and cameras – all of which continuously generate data. Also, the vehicle controller area network (CAN) bus data and test driver captures the control information. While the reality is impossible to predict with 100% accuracy, this high level of visibility and redundancy builds a detailed picture to enable the vehicle to make reliable decisions in adverse weather conditions or during an event of an individual component failure. Due to the safety requirements for driving, we need to ensure that the system used can detect objects sufficiently far away to operate safely at high speeds; this combination demands higher image resolutions than other industries use, which in turn generates more data. Massive challenges occur in terms of the scale of the unstructured sensor data (videos, cloud point, images, text) that must be captured and replayed to test ADAS subsystems.
To illustrate, a typical Society of Automotive Engineering (SAE) level 2 ADAS project, capturing 200,000 km (typical for SAE level 2) of driving at an average speed of 65 km/h, would generate over 3,076 hours of data, requiring approximately 3.8 petabytes (PB) of storage for one single sensor.
Note: that even within SAE level 2 solutions, the total number of ADAS sensors required varies with functionality (lane departure warning, self-parking, and more). Multiple sensors are typically required. For example, an SAE level 3 ADAS project, which typically requires 1,000,000 km of driving, could generate 19.3 PB of raw sensor data per car. As most ADAS developers have multiple cars, typical total storage requires averages between 50 – 100 PBs of data per vehicle model.
- Fast training cycle – To assure safety and reliability, the neural networks designed must utilize millions of parameters which generate more compute-intensive requirements for the underlying systems and hardware architecture. To accelerate time-to-market, neural network training must be as fast as possible. First, the deeper the network, the higher the number of parameters and operations need to store many intermediate results in GPU memory. Second, training usually proceeds in the method of mini-batches, I/O throughput is thus the primary performance metrics of concern in DL training.
To illustrate, running AlexNet – one of the least computationally heavy image classification models – together with a substantially smaller ImageNet dataset achieves a throughput of ~200MB/s on a single V100 GPU. For 1 PB of data, if we use one single V100 GPU with 50 epochs of training in average on ImageNet dataset, takes:
50 epochs * 1PB of data / 200 MB/s = 7.9 years to train an AlexNet network
A single GPU server cannot address the large amount of data training and computational capability required for ADAS / AD. The ability to scale neural networks and train large datasets across GPU servers and scale-out storage is critical for ADAS / AD and required to support the distributed DL training.
- Test and validation – Validation is a key stage of the ADAS development cycle. Since most ADAS systems directly affect safety, the robustness and reliability of the trained model is paramount. This demands exhaustive testing and verification on the trained algorithm to represent diverse traffic scenarios and dimensions, which might include road geometry, driver and pedestrian behaviors, traffic conditions, weather conditions, vehicle characteristics and variants, spontaneous component faults, security, and more.
- High quality labeled data – The availability of labeled data is critical for ADAS DL training. High quality labeled data yields better model performance. Labels are added either manually (often via crowd sourcing) or automatically by image analysis, depending on the complexity of the problem. Labeling massive collections of training data with high quality is a tedious task and requires significant effort.