An Unsupervised Approach for Precise Context Identification from Unstructured Text Documents

Autor: Maha Mallek, Wided Lejouad Chaari, Bernard Espinasse, Ramzi Guetari, Sébastien Fournier
Přispěvatelé: Recherche d’information et Interactions (R2I), Laboratoire d'Informatique et Systèmes (LIS), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI)
2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), Nov 2020, Baltimore, France. pp.821-826, ⟨10.1109/ICTAI50040.2020.00130⟩
ICTAI
DOI: 10.1109/ICTAI50040.2020.00130⟩
Popis: The majority of the documents produced and exchanged through medias and social networks are unstructured. Due to the amount of these unstructured documents on the Web, their exploitation represents a tedious or even impossible task for human beings without assistance by dedicated algorithms and specialized computer systems in document classification or information extraction. To be efficient and relevant, such systems have to understand the content of these unstructured documents. The context (or topic) of a document is one of the basic information essential for the understanding of its content, and the more precise the context of a document, the more relevant its understanding will be. This paper presents a precise context identification approach that is evaluated quantitatively and qualitatively on several reference corpora and compared to other context identification systems. The contexts identified by our model are much more precise than those identified by these others systems.
Databáze: OpenAIRE