Semantically Aware Text Categorisation for Metadata Annotation
Autor: | Paolo Tripodi, Daniele Paolo Radicioni, Marco Leontino, Guido Bonino, Enrico Pasini, Giulio Carducci |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Information retrieval
Computer science Unlabelled data Computer Science (all) Language models 02 engineering and technology Semantics Lexical resources NLP Semantic network Set (abstract data type) Text categorization Mathematics (all) Knowledge Graphs 020204 information systems Test set 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Language model Classifier (UML) Metadata annotation |
Zdroj: | Communications in Computer and Information Science ISBN: 9783030112257 IRCDL |
DOI: | 10.5281/zenodo.2555437 |
Popis: | In this paper we illustrate a system aimed at solving a longstanding and challenging problem: acquiring a classifier to automatically annotate bibliographic records by starting from a huge set of unbalanced and unlabelled data. We illustrate the main features of the dataset, the learning algorithm adopted, and how it was used to discriminate philosophical documents from documents of other disciplines. One strength of our approach lies in the novel combination of a standard learning approach with a semantic one: the results of the acquired classifier are improved by accessing a semantic network containing conceptual information. We illustrate the experimentation by describing the construction rationale of training and test set, we report and discuss the obtained results and conclude by drawing future work. |
Databáze: | OpenAIRE |
Externí odkaz: |