Deep Feature Generation for Author Identification
Autor: | Sukru Ozan, Davut Emre Tasar, Umut Özdil |
---|---|
Rok vydání: | 2021 |
Předmět: |
business.industry
Computer science Natural Language Processing Document Embeddings Logistic Regression Support Vector Machines Author Identification Mühendislik Machine learning computer.software_genre Logistic regression Support vector machine Identification (information) Engineering Materials Chemistry Feature generation Artificial intelligence business computer |
Zdroj: | Volume: 17, Issue: 2 137-143 Celal Bayar University Journal of Science |
ISSN: | 1305-130X 1305-1385 |
DOI: | 10.18466/cbayarfbe.846016 |
Popis: | Identifying the authors of a given set of text is a well addressed and complicated task. It requires thorough knowledge of different authors’ writing styles and discriminating them. As the main contribution of this paper, we propose to perform this task using machine learning and deep learning methods, state-of-the-art algorithms, and methods used in numerous complex Natural Language Processing (NLP) problems. We used a text corpus of daily newspaper columns written by thirty authors to perform our experiments. The experimental results proved that document embeddings trained via neural network architecture achieve cutting edge accuracy in learning writing styles and identifying authors of given writings even though the dataset has a considerably unbalanced distribution. We represent our experimental results and outsource our codes for interested readers and natural language processing (NLP) enthusiasts as a GitHub repository. They can reproduce and confirm the results and modify them according to their own needs. |
Databáze: | OpenAIRE |
Externí odkaz: |