Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
Autor: | Elena González-Blanco, María Luisa Díez Platas, Pablo Ruiz Fabo, Elena Álvarez Mellado, Salvador Ros Muñoz |
---|---|
Přispěvatelé: | Universidad Nacional de Educación a Distancia (UNED), Linguistique, Langues et Parole (LILPA), Université de Strasbourg (UNISTRA), European Project: 679528,H2020,ERC-2015-STG,POSTDATA(2016) |
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
Information Systems and Management
Computer Networks and Communications Computer science 02 engineering and technology Library and Information Sciences computer.software_genre Semantics [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL] Annotation Text mining Named-entity recognition 020204 information systems 0202 electrical engineering electronic engineering information engineering Contextual information [SHS.LANGUE]Humanities and Social Sciences/Linguistics Research Articles Parsing business.industry Text structure 020201 artificial intelligence & image processing Artificial intelligence business computer Period (music) Natural language processing Information Systems Generator (mathematics) Research Article |
Zdroj: | Journal of the Association for Information Science and Technology Journal of the Association for Information Science and Technology, ASIS&T/Wiley, 2020, ⟨10.1002/asi.24399⟩ Journal of the Association for Information Science and Technology, 2020, ⟨10.1002/asi.24399⟩ |
ISSN: | 2330-1643 2330-1635 |
Popis: | The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper‐noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity‐type‐specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75. |
Databáze: | OpenAIRE |
Externí odkaz: |