Text classification based on limited bibliographic metadata
Autor: | Kerstin Denecke, Thomas Baehr, Thomas Risse |
---|---|
Rok vydání: | 2009 |
Předmět: |
Learning classifier system
Information retrieval Computer science business.industry Feature extraction computer.software_genre Metadata repository Metadata Set (abstract data type) Feature (machine learning) Data set (IBM mainframe) Artificial intelligence business Classifier (UML) computer Natural language processing |
Zdroj: | ICDIM |
DOI: | 10.1109/icdim.2009.5356767 |
Popis: | In this paper, we introduce a method for categorizing digital items according to their topic, only relying on the document's metadata, such as author name and title information. The proposed approach is based on a set of lexical resources constructed for our purposes (e.g., journal titles, conference names) and on a traditional machine-learning classifier that assigns one category to each document based on identified core features. The system is evaluated on a real-world data set and the influence of different feature combinations and settings is studied. Although the available information is limited, the results show that the approach is capable to efficiently classify data items representing documents. |
Databáze: | OpenAIRE |
Externí odkaz: |