Resilience Analysis of Top K Selection Algorithms

Autor: Joanne Wendelberger, Laura Monroe, Ryan Slechta, Sarah E. Michalak, Nathan DeBardeleben, Qiang Guan
Rok vydání: 2017
Předmět:
Zdroj: EDCC
DOI: 10.1109/edcc.2017.23
Popis: As the number of components in high-performance computing (HPC) systems continues to grow, the number of vehicles for soft errors will rise in parallel. Petascale research has shown that soft errors on supercomputers can occur as frequently as multiple times per day, and this rate will only increase with the exascale era. Due to this frequency, the resilience community has taken an interest in algorithmic resilience as a means for reliable computing in faulty environments. Probabilistic algorithms in particular have generated interest, due to their imprecise nature and ability to handle incorrect guesses. In this paper, we analyze the intrinsic resilience of a probabilistic Top K selection algorithm to silent data corruption in the event of a single event upset. We introduce a new paradigm of analytically quantifying an algorithm's resilience as a function of its inputs, which permits a precise comparison of the resilience of competing algorithms. In addition, we discuss the implications of our findings on the resilience of probabilistic algorithms as a whole in comparison to their deterministic counterparts.
Databáze: OpenAIRE