CloudRanger: Root Cause Identification for Cloud Native Systems
Autor: | Jingmin Xu, Weilan Lin, Pengfei Chen, Meng Ma, Disheng Pan, Ping Wang, Yuan Wang |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Heuristic (computer science) Computer science Distributed computing Throughput Cloud computing 02 engineering and technology Root cause 01 natural sciences 010104 statistics & probability Identification (information) 020204 information systems 0202 electrical engineering electronic engineering information engineering Anomaly detection 0101 mathematics IBM Root cause analysis business |
Zdroj: | CCGrid |
Popis: | As more and more systems are migrating to cloud environment, the cloud native system becomes a trend. This paper presents the challenges and implications when diagnosing root causes for cloud native systems by analyzing some real incidents occurred in IBM Bluemix (a large commercial cloud). To tackle these challenges, we propose CloudRanger, a novel system dedicated for cloud native systems. To make our system more general, we propose a dynamic causal relationship analysis approach to construct impact graphs amongst applications without given the topology. A heuristic investigation algorithm based on second-order random walk is proposed to identify the culprit services which are responsible for cloud incidents. Experimental results in both simulation environment and IBM Bluemix platform show that CloudRanger outperforms some state-of-the-art approaches with a 10% improvement in accuracy. It offers a fast identification of culprit services when an anomaly occurs. Moreover, this system can be deployed rapidly and easily in multiple kinds of cloud native systems without any predefined knowledge. |
Databáze: | OpenAIRE |
Externí odkaz: |