Popis: |
Current Machine Learning (ML) approaches typically present either a centralized or federated architecture. However, these architectures cannot easily keep up with some of the challenges introduced by recent trends, such as the growth in the number of IoT devices, increasing awareness about the privacy and security implications of extensive data collection, and the rise of graph-structured data and Graph Representation Learning. Systems based on either direct data collection or Federated Learning contain centralized, privileged systems that may act as scalability bottlenecks and dangerous single points of failure, while requiring users to trust the privacy protections and security practices in place. The combination of these issues ultimately leads to data waste, as opportunities to extract insights from available data are missed and thus the full societal benefits of advanced data analytics and ML are not realized. In this thesis, we argue for a paradigm shift towards a completely decentralized and trustless architecture for privacy-aware Graph Representation Learning, which employs Gossip Learning and other gossip-based peer-to-peer techniques to achieve high levels of scalability and resilience while reducing the risk of privacy leaks. We then identify and pursue three key research directions necessary to achieve our vision: lifting unrealistic assumptions on Gossip Learning, identifying and developing specific use cases that are enabled or improved by gossip-based decentralization, and overcoming the obstacles to the deployment of decentralized training and inference for Graph Representation Learning models. Based on these key directions, our contributions are as follows. First, we analyze the robustness of Gossip Learning when several unrealistic but often assumed conditions are lifted. Then, we exploit Gossip Learning and gossip-based peer-to-peer protocols more in general across three use cases: the collaborative training of differentially-private Naive Bayes classifiers across organizations holding sensitive user data; the construction of decentralized, privacy-preserving data marketplaces; and the development and decentralization of early-stage IoT botnet detection systems based on Graph Representation Learning. Finally, we introduce a general framework for the fully-decentralized training of Graph Neural Networks, overcoming the typical requirement of these models to access non-local information during training and inference. The combination of these contributions removes major roadblocks towards decentralized graph learning, and also opens a new research direction aimed at further developing and optimizing the fully-decentralized training of Graph Representation Learning models. |