Zobrazeno 1 - 10
of 14
pro vyhledávání: '"Soila Kavulya"'
Autor:
Liting Hu, Soila Kavulya, Mike Kasick, Jiaqi Tan, Karsten Schwan, Priya Narasimhan, Mahendra Kutare, Chengwel Wang, Rajeev Gandhi
Publikováno v:
ACM SIGOPS Operating Systems Review. 47:50-62
In the emerging cloud computing era, enterprise data centers host a plethora of web services and applications, including those for e-Commerce, distributed multimedia, and social networks, which jointly, serve many aspects of our daily lives and busin
Publikováno v:
ACM SIGMETRICS Performance Evaluation Review. 37:8-13
Ganesha aims to diagnose faults transparently (in a black-box manner) in MapReduce systems, by analyzing OS-level metrics. Ganesha's approach is based on peer-symmetry under fault-free conditions, and can diagnose faults that manifest asymmetrically
Publikováno v:
HASE
Modern vehicles with semi-autonomous (driver-assistance systems) and autonomous capabilities require sophisticated on-board and off-board diagnostics for safe operation, and to reduce unnecessary component replacements at the service garage. We prese
Publikováno v:
Proceedings of the 2012 workshop on Management of big data systems.
Detecting failures in distributed systems is challenging, as modern datacenters run a variety of applications. Current techniques for detecting failures often require training, have limited scalability, or have results that are hard to interpret. We
Autor:
Matti Hiltunen, Soila Kavulya, Kaustubh Joshi, Scott Daniels, Priya Narasimhan, Rajeev Gandhi
Publikováno v:
DSN
Chronics are recurrent problems that often fly under the radar of operations teams because they do not affect enough users or service invocations to set off alarm thresholds. In contrast with major outages that are rare, often have a single cause, an
Publikováno v:
Resilience Assessment and Evaluation of Computing Systems, edited by Wolter K., Avritzer A., Vieira M., van Moorsel A., pp. 239–261. Berlin: Springer-Verlag, 2012
Resilience Assessment and Evaluation of Computing Systems ISBN: 9783642290312
Resilience Assessment and Evaluation of Computing Systems
info:cnr-pdr/source/autori:Kavulya S. P., Joshi K., Di Giandomenico F., Narasimhan P./titolo:Failure Diagnosis of Complex Systems./titolo_volume:Resilience Assessment and Evaluation of Computing Systems/curatori_volume:Wolter K., Avritzer A., Vieira M., van Moorsel A./editore: /anno:2012
Resilience Assessment and Evaluation of Computing Systems ISBN: 9783642290312
Resilience Assessment and Evaluation of Computing Systems
info:cnr-pdr/source/autori:Kavulya S. P., Joshi K., Di Giandomenico F., Narasimhan P./titolo:Failure Diagnosis of Complex Systems./titolo_volume:Resilience Assessment and Evaluation of Computing Systems/curatori_volume:Wolter K., Avritzer A., Vieira M., van Moorsel A./editore: /anno:2012
Failure diagnosis is the process of identifying the causes of impairment in a system’s function based on observable symptoms, i.e., determining which fault led to an observed failure. Since multiple faults can often lead to very similar symptoms, f
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::ca2729c84b04f277680a4ffbf80965f1
https://openportal.isti.cnr.it/doc?id=people______::9a5bfb5bed5ad75f774fd83457b72700
https://openportal.isti.cnr.it/doc?id=people______::9a5bfb5bed5ad75f774fd83457b72700
Autor:
Ben Gotow, Soila Kavulya, Mark Shuster, Jason Campbell, Jiaqi Tan, Priya Narasimhan, Sriram Ramasubramanian, Arun B. Ganesan, James Mulholland
Publikováno v:
Proceedings of the 5th ACM Symposium on Computer Human Interaction for Management of Information Technology.
New abstractions are simplifying the programming of large clusters, but diagnosis nontheless gets more and more challenging as cluster sizes grow: Debugging information increases linearly with cluster size, and the count of intercomponent relationshi
Autor:
Kaustubh Joshi, Soila Kavulya, Scott Daniels, Matti A. Hiltunen, Rajeev Gandhi, Priya Narasimhan
Publikováno v:
Managing Large-scale Systems via the Analysis of System Logs and the Application of Machine Learning Techniques.
Chronics are recurrent problems that fly under the radar of operations teams because they do not perturb the system enough to set off alarms or violate service-level objectives. The discovery and diagnosis of never-before seen chronics poses new chal
Publikováno v:
ICDCS
The distributed nature and large scale of MapReduce programs and systems poses two challenges in using existing profiling and debugging tools to understand MapReduce programs. Existing tools produce too much information because of the large scale of
Publikováno v:
NOMS
We present Kahuna, an approach that aims to diagnose performance problems in MapReduce systems. Central to Kahuna's approach is our insight on peer-similarity, that nodes behave alike in the absence of performance problems, and that a node that behav