A scalable monitoring for the CMS Filter Farm based on elasticsearch

Autor: André Holzner, Olivier Chaze, Luciano Orsini, Benjamin Stieger, Dominique Gigi, Petr Zejdl, R. Jimenez-Estupiñán, Jeroen Hegeman, S. Zaza, Christian Deldicque, C. Nunez-Barranco-Fernandez, P. Roberts, James G Branson, Georgiana-Lavinia Darlea, Andrea Petrucci, Remigius K. Mommsen, Vivian O'Dell, Guillelmo Gomez-Ceballos, Marco Pieri, Lucia Masetti, Jan Veverka, Emilio Meschi, Dupont A, Sergio Cittolin, Samim Erhan, Christoph M. E. Paus, Marc Dobson, Anastasios Andronidis, Frank Glege, Hannes Sakulin, Konstanty Sumorok, Attila Racz, Frans Meijers, Ulf Behrens, Srecko Morovic, J. M. Andre, Christoph Schwick
Přispěvatelé: Massachusetts Institute of Technology. Department of Physics, Massachusetts Institute of Technology. Laboratory for Nuclear Science, Gomez-Ceballos, Guillelmo, Paus, Christoph M. E., Sumorok, Konstanty C, Veverka, Jan, Darlea, G. L.
Jazyk: angličtina
Rok vydání: 2015
Předmět:
Zdroj: IOP Publishing
Popis: A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central” es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioningmore » for LHC Run 2.« less
Databáze: OpenAIRE