A Software Pipeline for the Reception of Italian Literature in Nineteenth-Century England. Preliminary Testing

Autor: S. Rebora
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: DATeCH
Popis: This paper presents and discusses a project design aimed at producing synthetic and intuitive visualizations of the reception of Italian literature in nineteenth-century England. In the first part, a processing pipeline is described which combines software in optical character recognition (OCR), named entity recognition (NER), topic segmentation, and sentiment analysis. In the second part, the feasibility of the project is preliminarily tested: (1) by evaluating the quality of the possible corpora (on a sample of 23 texts) and discussing methods for further improving the OCR; (2) by comparing the results obtained by free software (e.g. OpenNLP, ANNIE, NLTK, Stanford CoreNLP) on the sample corpus. The outcomes suggest that, while the project is realizable with the already available resources, a further training of the software on an annotated corpus may substantially improve the quality of the results.
Databáze: OpenAIRE