Varanus: More-with-less fault localization in data centers.

Autor: Sadaphal, Vaishali, Natu, Maitreya, Vin, Harrick, Shenoy, Prashant
Zdroj: 2012 Fourth International Conference on Communication Systems & Networks (COMSNETS 2012); 1/ 1/2012, p1-10, 10p
Abstrakt: Detecting and localizing performance faults is crucial for operating large enterprise data centers. This problem is relatively straightforward to solve if each entity (applications, servers, business processes) within the data center can be instrumented and monitored explicitly. Unfortunately, such instrument-everything approach is often not tenable because of the limits imposed by enterprises on the permissible amounts of instrumentation intrusiveness and monitoring overhead. In this paper, we address the problem of achieving high accuracy of detecting and localizing performance faults in data centers, while minimizing the required instrumentation intrusiveness and overhead. We present novel algorithms for solving three key subproblems: (1) How many monitors are required and where should they be placed within the data center? (2) Given the proposed instrumentation plan, how to detect the existence of performance faults accurately? and (3) How to localize the root-cause of the fault? We demonstrate the effectiveness of our approach for a real-world data center topology as well as through extensive simulations. [ABSTRACT FROM PUBLISHER]
Databáze: Complementary Index