REDEMON: Resilient Decentralized Monitoring System for Edge Infrastructures

Autor: Felix Freitag, Mennan Selimi, Leandro Navarro, Roger Pueyo Centelles
Přispěvatelé: Universitat Politècnica de Catalunya. Doctorat en Arquitectura de Computadors, Universitat Politècnica de Catalunya. Departament d'Arquitectura de Computadors, Universitat Politècnica de Catalunya. CNDS - Xarxes de Computadors i Sistemes Distribuïts
Rok vydání: 2020
Předmět:
Zdroj: CCGRID
UPCommons. Portal del coneixement obert de la UPC
Universitat Politècnica de Catalunya (UPC)
DOI: 10.1109/ccgrid49817.2020.00-84
Popis: The Guifi.net community network has evolved during the past 15 years into a telecommunications infrastructure that offers Internet access to more than 80.000 people. The monitoring system currently in place for this network is lagging behind the growth of the infrastructure, requiring manual intervention and counting several single points of failure. In this paper we present REDEMON, a resilient decentralized monitoring system, hosted on distributed and interconnected edge devices, for a reliable, eventually-consistent monitoring of the Guifi.net network, leveraging CRDT-based data structures implemented on AntidoteDB. We developed the REDEMON system as a prototype featuring resilience, decentralization and automation, in order to replace the legacy monitoring system. To assess the system, this prototype was deployed on resource-constraint edge nodes in the Guifi.net production network and evaluated under realistic conditions. The decentralized assignment mechanism successfully achieves setting the minimum number of monitoring servers per network device that satisfies the established system requirements. Besides, by concentrating the workload on the minimum required number of servers running at their maximum capacity, the remaining devices can idle away, reducing the consumption footprint of the system. With regard to computing resources, we measure a moderate CPU and RAM usage by the monitoring system on low-capacity devices, while we observe that a considerable network traffic is required for achieving a resilient and consistent data storage layer. This resilient and decentralized architecture could lay the basis for other edge applications in the cloud computing domain that need to coordinate over distributed and consistent shared data. This work was supported by the European H2020 framework programme project LightKone (H2020-732505), by the Spanish State Research Agency (AEI) under contracts PCI2019- 111850-2 and PCI2019-111851-2, and the Catalan government AGAUR SGR 990.
Databáze: OpenAIRE