Popis: |
Autonomous Driving Systems (ADSs) have seen rapid progress in recent years. To ensure the safety and reliability of these systems, extensive testing is being conducted before their future mass deployment. One approach is to test ADSs directly on the road, but it is incredibly costly to cover all rare corner cases. Thus, a popular complementary approach is to evaluate an ADS’s performance in a simulator. Such method is called simulation based testing. However, randomly testing ADSs in simulation is still not efficient enough and the testing results might not transfer to the real-world. This dissertation underscores that the cornerstone of efficient simulation testing lies in crafting optimal testing scenarios. We delineate several pivotal properties for these scenarios: they should induce ADS misbehavior, exhibit diversity, manifest realism, and adhere to user specified rules (e.g., following traffic rules). Subsequent to this identification, our research delves into methodologies to enhance one or more of these properties of the generated scenarios. Specifically, we embark on two distinct lines of approach. First, we develop advanced search strategies to unearth diverse scenarios that provoke ADS to misbehave. Second, we harness the potential of deep generative models to produce scenarios that are both realistic and in compliance with user specified rules. Because of the need for efficiently testing end-to-end behaviors of ADSs against complex, real-world corner cases, we propose AutoFuzz, a novel fuzz testing technique, which can leverage widely-used driving simulators’ API grammars to generate complex driving scenarios. In order to find misbehavior-inducing scenarios, which are very rare, we propose a learning based search method to optimize AutoFuzz. In particular, our method trains a neural network to select and mutate scenarios sampled from an evolutionary search method. AutoFuzz shows promises in efficiently identifying traffic violations for the given ADSs under test. Although AutoFuzz is good at finding violations, as a black-box method, it is agnostic of the cause of the violations. In the second project, we focus on finding violations caused by the failure of fusion component, which fuses the inputs of multiple sensors and provides the ADS a more reliable understanding of the surroundings. In particular, we identify that the fusion component of an industry-grade ADAS can fail to trust the more reliable input sensor and thus lead to a collision. We define misbehavior caused by such a failure as "fusion error". In order to efficiently find fusion errors, we propose a fuzzing framework, named FusED, that uses a novel evolutionary-based search method with objective promoting fusion output to deviate from sensor input. We show that FusED can efficiently reveal fusion errors for an industry-grade ADAS. One issue with the generated scenarios by AutoFuzz or FusED (or any other search based methods) is that all the NPC vehicles are controlled by some low-level controllers, whose behaviors are different from human drivers. This poses a difficulty in transferring the found violations into real world. Some recent work tries to address this problem by using deep generative models. However, the scenarios cannot be easily controlled which is not desirable for users to customize the testing scenarios. As both realism and controllability of the generated traffic are desirable, we propose a novel method called Controllable Traffic Generation (CTG) that achieves both properties simultaneously. In order to preserve realism, we propose a conditional, dynamic enforced diffusion model. In order to satisfy controllability, we propose using a kind of "traffic language" called Signal Temporal Logic (STL) to specify what we want in traffic scenarios (e.g., following road rules). We then leverage STL to guide the conditional diffusion model for generating realistic and controllable traffic. Although CTG can generate realistic and controllable traffic, it still requires domain expertise to specify the STL based loss function. Besides, it models traffic participants independently, resulting in sub-optimal agents interaction modeling. In order to address these issues, we developed CTG++ which enables a user to use language to generate realistic traffic scenario. In particular, we proposed to use GPT4 to translate a command in natural language into a loss function in code. We then use the loss function to guide a scene-level diffusion model, which considers all the vehicles jointly, to generate traffic satisfying the command. We have found that CTG++ can generate query (in natural language)-compliant and realistic traffic simulation. In summary, our four projects discussed in this thesis have solved important problems in efficiently testing ADSs and have had significant influence in the advancement of ADS. Besides, the models and empirical studies we performed can be applicable to other testing and behavior generation problems, such as general ML-based software testing, and multi-agent behavior planning and prediction. I hope this thesis can serve as an inspiration to anyone who is interested in the exciting field of ADS testing and development, and contribute to the realization of the full automation of driving. |