A Unified Framework for User Identification Across Online and Offline Data
Autor: | Haishan Wu, Jingbo Zhou, Yunsheng Cheng, Longbo Huang, Tianyi Hao |
---|---|
Rok vydání: | 2022 |
Předmět: |
Online and offline
Social network business.industry Computer science Spatial database 02 engineering and technology computer.software_genre Computer Science Applications Data modeling Computational Theory and Mathematics 020204 information systems 0202 electrical engineering electronic engineering information engineering Pairwise comparison Data mining Cluster analysis Precision and recall business computer Information Systems |
Zdroj: | IEEE Transactions on Knowledge and Data Engineering. 34:1562-1575 |
ISSN: | 2326-3865 1041-4347 |
DOI: | 10.1109/tkde.2020.3000287 |
Popis: | User identification across multiple datasets has a wide range of applications and there has been an increasing set of research works on this topic during recent years. However, most of existing works focus on user identification with a single input data type, e.g., (I) identifying a user across multiple social networks with online data and (II) detecting a single user from heterogeneous trajectory datasets with offline data. Different from previous works, in this paper, we propose a framework on user identification between online and offline datasets. We build connections between these two types of data by a mapping from IP addresses to physical locations. To solve this problem, we propose a novel framework consists of three steps. First, we use a clustering method based on locations of IP addresses to map IP addresses into specific physical location distributions. Second, we propose a novel pairwise index to reduce space cost and running time for computing the co-occurrence. Lastly, we apply a learning-to-rank method to merge the effect of multiple features we get in the first two steps. Based on our framework, we design experiments to demonstrate the efficiency (in time and space) of our framework, together with the precision and recall of our approach compared to other methods. |
Databáze: | OpenAIRE |
Externí odkaz: |