Here are some key design considerations for data ingestion and data management for ADAS DL workflow:
- Large scale sensor data and metadata management: Dealing with data access, data ingestion, and metadata management with HPC jobs can become a significant waste of time and effort for the data scientist. To alleviate this, Dell Technologies has created a custom-tailored Data Management System (DMS) that is available as part of our PowerScale-based ADAS solution offering, as shown in following diagram. DMS is integrated into the managed service offering for ingesting data from test vehicles in an automated fashion. Specific metadata types and tags are configurable and can be customized for each customer. The service provider can manage logistics to receive the storage media from the vehicles and insert them into ingestion stations. This will automatically initiate processes that dynamically decide which cluster/location to upload the raw data to – in such a fashion that all ingestion processes are well balanced across the storage environment. DMS then initiates the data transfer automatically while enriching the data with appropriate metadata tagging, resulting in all data and tags being logged in a global index that tracks the data sets across the namespace and clusters. At the completion of the data transfer and indexing, DMS can launch post-processing jobs within the DL and HPC server grid as preconfigured through the scheduling platform. Functions available can include accurately splitting and merging data files as required, decoding the data into
human readable form, automatically updating the index upon completion, and other functions as defined by the customer. These processes are wrapped within a series of real-time monitoring and status dashboards and logs to alert of any errors or anomalies.
Dell Data Management Solution Dashboard:
- Hybrid Cloud Solution for ADAS Development: Auto organizations are investing heavily in new technologies to develop advanced driver-assistance systems (ADAS). One of the key hurdles these companies face in this pursuit, is the fact that sensor data is often spread throughout the organization’s environment—across platforms and geographic locations. This results in large, disconnected data silos that many find it difficult to manage as these sensor files must be made available to a range of test environments and AI tools. In addition, sensor files are typically kept for multiple years or even decades after testing – the “life of the car” – and must brought online quickly (often within weeks) . With the new release of Dell DataIQ products as shown in Figure 7, it solves for these challenges with its ability to break down storage silos and unify the unstructured data environment. It provides a single UI to view sensor data across heterogenous storage environments including the public cloud.
DataIQ also equips organizations with the ability to rapidly search across billions of files and locate data, which can then be precisely moved or copied to on-prem or public cloud storage as needed, ensuring the right projects have access to pertinent data. This can help free up space on higher performance tiers and enables organizations to archive data as long as is required. The cost savings can be immense, and data can be retrieved easily from a well-curated, searchable archive for re- simulation and DL development in the future.
Figure 7. New Hybrid Cloud Solution for ADAS Development with Dell DataIQ