On the Theoretical Foundations of Data Exchange Economies

Autor: Akrami, Hannaneh, Chaudhury, Bhaskar Ray, Garg, Jugal, Murhekar, Aniket
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: The immense success of ML systems relies heavily on large-scale, high-quality data. The high demand for data has led to many paradigms that involve selling, exchanging, and sharing data, motivating the study of economic processes with data as an asset. However, data differs from classical economic assets in terms of free duplication: there is no concept of limited supply since it can be replicated at zero marginal cost. This distinction introduces fundamental differences between economic processes involving data and those concerning other assets. We study a parallel to exchange (Arrow-Debreu) markets where data is the asset. Here, agents with datasets exchange data fairly and voluntarily, aiming for mutual benefit without monetary compensation. This framework is particularly relevant for non-profit organizations that seek to improve their ML models through data exchange, yet are restricted from selling their data for profit. We propose a general framework for data exchange, built on two core principles: (i) fairness, ensuring that each agent receives utility proportional to their contribution to others; contributions are quantifiable using standard credit-sharing functions like the Shapley value, and (ii) stability, ensuring that no coalition of agents can identify an exchange among themselves which they unanimously prefer to the current exchange. We show that fair and stable exchanges exist for all monotone continuous utility functions. Next, we investigate the computational complexity of finding approximate fair and stable exchanges. We present a local search algorithm for instances with monotone submodular utility functions, where each agent contributions are measured using the Shapley value. We prove that this problem lies in CLS under mild assumptions. Our framework opens up several intriguing theoretical directions for research in data economics.
Comment: 42 pages
Databáze: arXiv