Predicting breakdowns in cloud services (with SPIKE)
Autor: | Kevin Haverlock, Philip Clark, Joymallya Chakraborty, Jianfeng Chen, Snehit Cherian, Tim Menzies |
---|---|
Rok vydání: | 2019 |
Předmět: |
FOS: Computer and information sciences
business.industry Computer science Node (networking) 020207 software engineering Cloud computing 02 engineering and technology Service provider Machine learning computer.software_genre Random forest Software Engineering (cs.SE) Computer Science - Software Engineering 020204 information systems Hyperparameter optimization 0202 electrical engineering electronic engineering information engineering Revenue Spike (software development) Artificial intelligence Web service business computer |
Zdroj: | Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. |
DOI: | 10.1145/3338906.3340450 |
Popis: | Maintaining web-services is a mission-critical task where any down-time means loss of revenue and reputation (of being a reliable service provider). In the current competitive web services market, such a loss of reputation causes extensive loss of future revenue. To address this issue, we developed SPIKE, a data mining tool which can predict upcoming service breakdowns, half an hour into the future. Such predictions let an organization alert and assemble the tiger team to address the problem (e.g. by reconfiguring cloud hardware in order to reduce the likelihood of that breakdown). SPIKE utilizes (a) regression tree learning (with CART); (b) synthetic minority over-sampling (to handle how rare spikes are in our data); (c) hyperparameter optimization (to learn best settings for our local data) and (d) a technique we called "topology sampling" where training vectors are built from extensive details of an individual node plus summary details on all their neighbors. In the experiments reported here, SPIKE predicted service spikes 30 minutes into future with recalls and precision of 75% and above. Also, SPIKE performed relatively better than other widely-used learning methods (neural nets, random forests, logistic regression). 9 pages, 6 figures, in Proceedings of The 27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE'19), industry track |
Databáze: | OpenAIRE |
Externí odkaz: |