Is Simple English Wikipedia As Simple And Easy-to-Understand As We Expect It To Be?
Autor: | Sergiu Nisioi, Daniel Ibanez, Sanja Štajner |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
business.industry Text simplification media_common.quotation_subject Syntactic complexity computer.software_genre Training material Simple (abstract algebra) Reading (process) ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Artificial intelligence business computer Natural language processing media_common |
Zdroj: | DSAI |
DOI: | 10.1145/3439231.3439263 |
Popis: | Conceptual complexity of a written text plays an important role in maintaining reader's interest in reading it. Therefore, automatic text simplification systems should, apart from considering lexical and syntactic complexity of a text, also consider the conceptual complexity. In this study, we analyze and compare two widely used English text simplification corpora, one professionally produced (Newsela) and the other collaboratively made by amateurs and enthusiasts (English Wikipedia–Simple English Wikipedia), focusing on 19 conceptual complexity features. The results indicated that simplification operations made during the production of Simple English Wikipedia in many cases do not follow the patterns of the professionally simplified corpora, thus casting doubts on adequacy of using Simple English Wikipedia as training material for automatic text simplification systems. |
Databáze: | OpenAIRE |
Externí odkaz: |