A dummy-based user privacy protection approach for text information retrieval

Autor:	Xinning Su, Zongda Wu, Shigen Shen, Enhong Chen, Xinze Lian
Rok vydání:	2020
Předmět:	Thesaurus (information retrieval) Information Systems and Management Information retrieval Ideal (set theory) Cover (telecommunications) business.industry Computer science InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Usability 02 engineering and technology Construct (python library) Management Information Systems Search engine Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing business Software
Zdroj:	Knowledge-Based Systems. 195:105679
ISSN:	0950-7051
Popis:	Text retrieval enables people to efficiently obtain the desired data from massive text data, so has become one of the most popular services in information retrieval community. However, while providing great convenience for users, text retrieval results in a serious issue on user privacy. In this paper, we propose a dummy-based approach for text retrieval privacy protection. Its basic idea is to use well-designed dummy queries to cover up user queries and thus protect user privacy. First, we present a client-based system framework for the protection of user privacy, which requires no change to the existing algorithm of text retrieval, and no compromise to the accuracy of text retrieval. Second, we define a user privacy model to formulate the requirements that ideal dummy queries should meet, i.e., (1) having highly similar feature distributions with user queries, and (2) effectively reducing the significance of user query topics. Third, by means of the knowledge derived from Wikipedia, we present an implementation algorithm to construct a group of ideal dummy queries that can well meet the privacy model. Finally, we demonstrate the effectiveness of our approach by theoretical analysis and experimental evaluation. The results show that by constructing dummy queries that have similar feature distributions but unrelated topics with user queries, the privacy behind users’ textual queries can be effectively protected, under the precondition of not compromising the accuracy and usability of text retrieval.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::2b9f22a1574f18037823e402e04425fc https://doi.org/10.1016/j.knosys.2020.105679 Zobrazit plný text záznamu Full Text from ScienceDirect