CrawlSN: community-aware data acquisition with maximum willingness in online social networks

Autor: Ming-Yi Chang, Bay-Yuan Hsu, Chih-Ya Shen, Chia-Lin Tu
Rok vydání: 2020
Předmět:
Zdroj: Data Mining and Knowledge Discovery. 34:1589-1620
ISSN: 1573-756X
1384-5810
DOI: 10.1007/s10618-020-00709-5
Popis: Real social network datasets with community structures are critical for evaluating various algorithms in Online Social Networks (OSNs). However, obtaining such community data from OSNs has recently become increasingly challenging due to privacy issues and government regulations. In this paper, we thus make our first attempt to address two important factors, i.e., user willingness and existence of community structure, to obtain more complete OSN data. We formulate a new research problem, namely Community-aware Data Acquisition with Maximum Willingness in Online Social Networks (CrawlSN), to identify a group of users from an OSN, such that the group is a socially tight community and the users’ willingness to contribute data is maximized. We prove that CrawlSN is NP-hard and inapproximable within any factor unless, and propose an effective algorithm, named Community-aware Group Identification with Maximum Willingness (CIW) with various processing strategies. We conduct an evaluation study with 1093 volunteers to validate our problem formulation and demonstrate that CrawlSN outperforms the other alternatives. We also perform extensive experiments on 7 real datasets and show that the proposed CIW outperforms the other baselines in both solution quality and efficiency.
Databáze: OpenAIRE