An Analysis of Data Curation Techniques throughout the Perception Development Pipeline

Autor: Jacob Perrin, Daniel Hasenklever
Rok vydání: 2023
Zdroj: SAE Technical Paper Series.
ISSN: 2688-3627
0148-7191
DOI: 10.4271/2023-01-0055
Popis: The development of perception functions for tomorrow’s automated vehicles is driven by enormous amounts of data: often exceeding a gigabyte per second and reaching into the terabytes per hour. Data is typically gathered by a fleet of dozens of mule vehicles which multiply the data generated into the hundreds of petabytes per year. Traditional methods for fueling data-driven development would record every bit of every second of a data logging drive on solid-state drives located on a PC in the vehicle. Recorded data must then be exported from these drives using an upload station which pushes to the data lake after arriving back at the garage.This paper considers different techniques for curating logged data. These curation methods are performed to maximize the usefulness of the data throughout its lifecycle and minimize the amount of data necessary for perception development and validation The reduction of logged data has the effect of not only curtailing storage costs, but also minimizing latency for data availability and maximizing possible campaign drive time. Advanced techniques considered include: (a) the real-time evaluation of sensor and bus signals, (b) the application of artificial intelligence (e.g. for similarity-based image discovery and dynamic scene content detection), and (c) using function prototyping to inform the data curation process. In addition, various possible integration points for the curation techniques are considered in the data ingestion pipeline - from the cloud in the datacenter to the in-vehicle logging system on the edge. The trade-offs for each proposed techniques are considered across the various components of the data ingestion pipeline to explore the feasibility of - and to arrive at some conclusions for - designing more intelligent data collection campaigns.
Databáze: OpenAIRE