A Method to Generate Soft Reference Data for Topic Identification

Autor:	Daniel Vélez, Guillermo Villarino, J. Tinguaro Rodríguez, Daniel Gómez
Rok vydání:	2020
Předmět:	Information retrieval Computer science Soft classification Context (language use) 0102 computer and information sciences 02 engineering and technology 01 natural sciences Soft reference Textual information Identification (information) 010201 computation theory & mathematics 0202 electrical engineering electronic engineering information engineering Benchmark (computing) 020201 artificial intelligence & image processing Relevance (information retrieval)
Zdroj:	Information Processing and Management of Uncertainty in Knowledge-Based Systems ISBN: 9783030501525 IPMU (3)
DOI:	10.1007/978-3-030-50153-2_5
Popis:	Text mining and topic identification models are becoming increasingly relevant to extract value from the huge amount of unstructured textual information that companies obtain from their users and clients nowadays. Soft approaches to these problems are also gaining relevance, as in some contexts it may be unrealistic to assume that any document has to be associated to a single topic without any further consideration of the involved uncertainties. However, there is an almost total lack of reference documents allowing a proper assessment of the performance of soft classifiers in such soft topic identification tasks. To address this lack, in this paper a method is proposed that generates topic identification reference documents with a soft but objective nature, and which proceeds by combining, in random but known proportions, phrases of existing documents dealing with different topics. We also provide a computational study illustrating the application of the proposed method on a well-known benchmark for topic identification, as well as showing the possibility of carrying out an informative evaluation of soft classifiers in the context of soft topic identification.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::f1484f201848cc7f896fb609bcf21ad7 https://doi.org/10.1007/978-3-030-50153-2_5 Zobrazit plný text záznamu