Popis: |
Mestrado em Ciberseguran?a na Escola Superior de Tecnologia e Gest?o do Instituto Polit?cnico de Viana do Castelo Nowadays, a set of services are available online with various associated data. It is essential to ensure the availability, integrity and confidentiality of all data. However, cyberattacks are a major threat. In this sense, an Intrusion Detection System (IDS) is an important tool to prevent potential threats to systems and data. It is necessary to implement new mechanisms with intelligence to successfully defend the complexity and intelligence of attacks, that is, to increase their efficiency. Anomaly-based IDSs may deploy machine learning algorithms to classify events either as normal or anomalous and trigger the adequate response. When using supervised learning, these algorithms require classified, rich, and recent datasets. Thus, to foster the performance of these machine learning models, datasets can be generated from different sources in a collaborative approach, and trained with multiple algorithms. This document proposes a vote-based architecture to generate classified datasets and improve performance of supervised learning-based IDSs. In a regular basis, multiple IDSs in different locations (companies) send their logs to a central system that combines and classifies them using different machine learning models and a majority vote system. Then, it generates a new and classified dataset, which is trained to obtain the best updated model to be integrated into the IDS of the companies involved. In this way, intrusion detection systems are frequently updated with the best machine learning model to increase their efficiency. The proposed architecture trains multiple times with several algorithms and, to shorten the overall runtimes, the proposed architecture was deployed in Fed4FIRE+, a federated testbed, with Ray to distribute the tasks by the available resources. This implementation allowed a reduction of the time in the classification between 31% and 33%, and in the training time of 43%. A set of machine learning algorithms and the proposed architecture were assessed. When compared with a baseline scenario, the proposed architecture enabled to increase the accuracy by 11.5% and the precision by 11.2%. |