A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing
Autor: | S. Rebora |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
Literary historiography
software pipeline Engineering media_common.quotation_subject Sample (statistics) text mining computer.software_genre World Wide Web Software Named-entity recognition Quality (business) Segmentation media_common 060201 languages & linguistics Literary historiography comparative literature text mining software pipeline project design project testing business.industry Sentiment analysis 06 humanities and the arts Optical character recognition 060202 literary studies comparative literature Pipeline (software) project testing 0602 languages and literature Artificial intelligence project design business computer Natural language processing |
Zdroj: | DATeCH |
Popis: | This paper presents and discusses a project design aimed at producing synthetic and intuitive visualizations of the reception of Italian literature in nineteenth-century England. In the first part, a processing pipeline is described which combines software in optical character recognition (OCR), named entity recognition (NER), topic segmentation, and sentiment analysis. In the second part, the feasibility of the project is preliminarily tested: (1) by evaluating the quality of the possible corpora (on a sample of 23 texts) and discussing methods for further improving the OCR; (2) by comparing the results obtained by free software (e.g. OpenNLP, ANNIE, NLTK, Stanford CoreNLP) on the sample corpus. The outcomes suggest that, while the project is realizable with the already available resources, a further training of the software on an annotated corpus may substantially improve the quality of the results. |
Databáze: | OpenAIRE |
Externí odkaz: |