An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents
Autor: | Maha Mallek, Wided Lejouad Chaari, Bernard Espinasse, Ramzi Guetari, Sébastien Fournier |
---|---|
Přispěvatelé: | Recherche d’information et Interactions (R2I), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Information retrieval
business.industry Computer science Document classification 05 social sciences 0507 social and economic geography 020207 software engineering Context (language use) 02 engineering and technology computer.software_genre [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Information extraction Identification (information) Statistical classification 0202 electrical engineering electronic engineering information engineering Encyclopedia Task analysis The Internet [INFO]Computer Science [cs] business 050703 geography computer ComputingMilieux_MISCELLANEOUS |
Zdroj: | 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Nov 2020, Baltimore, France. pp.821-826, ⟨10.1109/ICTAI50040.2020.00130⟩ ICTAI |
DOI: | 10.1109/ICTAI50040.2020.00130⟩ |
Popis: | The majority of the documents produced and exchanged through medias and social networks are unstructured. Due to the amount of these unstructured documents on the Web, their exploitation represents a tedious or even impossible task for human beings without assistance by dedicated algorithms and specialized computer systems in document classification or information extraction. To be efficient and relevant, such systems have to understand the content of these unstructured documents. The context (or topic) of a document is one of the basic information essential for the understanding of its content, and the more precise the context of a document, the more relevant its understanding will be. This paper presents a precise context identification approach that is evaluated quantitatively and qualitatively on several reference corpora and compared to other context identification systems. The contexts identified by our model are much more precise than those identified by these others systems. |
Databáze: | OpenAIRE |
Externí odkaz: |