ExtraVirt

Autor: Steven K. Reinhardt, Peter M. Chen, Dominic Lucchetti
Rok vydání: 2005
Předmět:
Zdroj: Proceedings of the twentieth ACM symposium on Operating systems principles.
DOI: 10.1145/1095810.1118621
Popis: Reliability is becoming an increasingly important issue in modern processor design. Smaller feature sizes and more numerous transistors are projected to increase the frequency of transient faults [4, 5]. Our project, ExtraVirt, leverages the trend toward multi-core and multi-processor systems to survive these transient faults. Our goals are (1) to add fault tolerance without modifying existing operating systems, applications or hardware, (2) to minimize the time spent executing software that cannot tolerate faults, and (3) to minimize the time and space overhead needed to detect and recover from faults. We accomplish these goals by leveraging virtual-machine technology and by sharing memory and I/O devices across replicas. ExtraVirt extends prior work on VM-level fault tolerance[2] by detecting and recovering from non-fail-stop faults and by running multiple replicas efficiently on a single machine.
Databáze: OpenAIRE