Benefits of Bias in Crawl-Based Network Sampling for Identifying Key Node Set

Autor: Sho Tsugawa, Hiroyuki Ohsaki
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: IEEE Access, Vol 8, Pp 75370-75380 (2020)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2020.2988910
Popis: We study the problem of identifying a set of key nodes from a network when limited knowledge about its structure is available. Most studies assume complete knowledge of the given network when identifying a set of key nodes, but in current practice, networks of interest are often too huge to obtain their entire topological structures. When the complete structure of a network is not available, network sampling strategies are often used to obtain a partial structure of the network. We investigate how network sampling strategies affect the problem of identifying a key node set. Specifically, we investigate the effect of conventional network sampling strategies on the solutions found for two types of key node set identification problems: the minimum $p$ -median problem and the influence maximization problem. Our results show that when the network is obtained using crawl-based network sampling strategies, both the minimum $p$ -median and the influence maximization problems are effectively solved by simple heuristic algorithms with sampling ratios in the 10-20% range. We also find that among three conventional sampling strategies (random sampling, random walk sampling, and sample edge counts) checked in this paper, random walk sampling is the most robust strategy in terms of effectively identifying the key node sets of diverse types of networks.
Databáze: Directory of Open Access Journals