Software fault tolerance for FPUs via vectorization

Autor: Alexander V. Veidenbaum, Zhi Chen, Alexandru Nicolau, Ryoichi Inagaki
Rok vydání: 2015
Předmět:
Zdroj: SAMOS
DOI: 10.1109/samos.2015.7363677
Popis: Future generation processors are expected to have high soft error rates and will require increased fault detection and fault tolerance. This work focuses on errors in execution units. Hardware or software duplication or triplication, parity, or residue codes could be used to detect errors in execution units. However, hardware duplication/triplication have significant area overhead and, in applications with high utilization of floating point units (FPU), very high energy cost. Software duplication/ triplication of instructions also increases both execution time and energy consumption. This paper proposes to reduce the cost of redundant instruction execution in FPUs through vectorization. Duplicated or triplicated instructions and result comparisons can be packed by a compiler into vector instructions, such as SSE or AVX. Experimental results using hand vectorization on a variety of benchmarks show that, compared to error detection through scalar instruction duplication, vector mode redundant execution achieves 1.78× and 2.73× average speedup for SSE and AVX instructions, respectively. It also significantly reduces the energy consumption, by an average of 40% and 53%, respectively, for SSE and AVX. Thus the proposed technique enables error detection with no hardware cost and reduced time and energy overhead compared to brute-force scalar instruction duplication.
Databáze: OpenAIRE