A model for mining relevant and non-redundant information

Autor:	Laura Langohr, Hannu Toivonen
Přispěvatelé:	Department of Computer Science, Discovery Research Group/Prof. Hannu Toivonen, Helsinki Institute for Information Technology, Finnish Centre of Excellence in Algorithmic Data Analysis Research (Algodan), Finnish Doctoral Programme in Computational Sciences (FICS)
Rok vydání:	2012
Předmět:	Information retrieval Similarity (geometry) Computer science education 02 engineering and technology 113 Computer and information sciences computer.software_genre Set (abstract data type) Simple (abstract algebra) 020204 information systems 0202 electrical engineering electronic engineering information engineering Key (cryptography) 020201 artificial intelligence & image processing Relevance (information retrieval) Data mining computer
Zdroj:	SAC
Popis:	We propose a relatively simple yet powerful model for choosing relevant and non-redundant pieces of information. The model addresses data mining or information retrieval settings where relevance is measured with respect to a set of key or query objects, either specified by the user or obtained by a data mining step. The problem addressed is not only to identify other relevant objects, but also ensure that they are not related to possible negative query objects, and that they are not redundant with respect to each other.The model proposed here only assumes a similarity or distance function for the objects. It has simple parameterization to allow for different behaviors with respect to query objects. We analyze the model and give two efficient, approximate methods. We illustrate and evaluate the proposed model on different applications: linguistics and social networks. The results indicate that the model and methods are useful in finding a relevant and non-redundant set of results.While this area has been a popular topic of research, our contribution is to provide a simple, generic model that covers several related approaches while providing a systematic model for taking account of positive and negative query objects as well as non-redundancy of the output.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::73ed4a83ed6ab07811f5fba521b18f87 https://doi.org/10.1145/2245276.2245304 Zobrazit plný text záznamu