Learning to Simplify Distributed Systems Management
Autor: | Ramya Raghavendra, Christopher Streiffer, Mudhakar Srivatsa, Theophilus Benson |
---|---|
Rok vydání: | 2018 |
Předmět: |
business.industry
Computer science 02 engineering and technology Microservices Extensibility Pipeline (software) 020202 computer hardware & architecture Task (project management) Component (UML) 0202 electrical engineering electronic engineering information engineering Unsupervised learning 020201 artificial intelligence & image processing Use case Software engineering business Distributed systems management |
Zdroj: | IEEE BigData |
Popis: | Managing large-scale distributed systems is a difficult task. System administrators are responsible for the upkeep and maintenance of numerous components with complex dependencies. With the shift to microservices-based architectures, these systems can consist of 100s to 1000s of interconnected nodes. To combat this difficulty, administrators rely on analyzing logs and metrics collected from the different services. However, the number of available metrics for large systems presents complexity and scaling issues. To combat these issues, we present Minerva, an unsupervised Machine Learning (ML) framework for performing network diagnosis analysis. Minerva is composed of a multi-stage pipeline, where each component can act individually or cohesively to perform various management tasks. Our system offers a unified and extensible framework for managing the complexity of large networks, and presents administrators with a swiss-army knife for diagnosing the overall health of their systems. To demonstrate the feasibility of Minerva, we evaluate its performance on a production-scale system. We present use cases for the various management tools made available by Minerva, and show how these tools can be used to make strong inferences about the system using unsupervised techniques. |
Databáze: | OpenAIRE |
Externí odkaz: |