A Framework for Understanding the Role of Morphology in Universal Dependency Parsing

Autor: Pascal Denis, Mathieu Dehouck
Přispěvatelé: Machine Learning in Information Networks (MAGNET), Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS), Inria Lille - Nord Europe, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre de Recherche en Informatique, Signal et Automatique de Lille - UMR 9189 (CRIStAL), Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)-Centrale Lille-Université de Lille-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Morphology (linguistics)
Parsing
Computer science
business.industry
02 engineering and technology
010501 environmental sciences
computer.software_genre
01 natural sciences
Measure (mathematics)
Syntax
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
Simple (abstract algebra)
Dependency grammar
0202 electrical engineering
electronic engineering
information engineering

020201 artificial intelligence & image processing
Artificial intelligence
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
business
computer
Word (computer architecture)
Natural language processing
0105 earth and related environmental sciences
Zdroj: EMNLP 2018-Conference on Empirical Methods in Natural Language Processing
EMNLP 2018-Conference on Empirical Methods in Natural Language Processing, Oct 2018, Brussels, Belgium
HAL
EMNLP
Popis: This paper presents a simple framework for characterizing morphological complexity and how it encodes syntactic information. In particular, we propose a new measure of morphosyntactic complexity in terms of governordependent preferential attachment that explains parsing performance. Through experiments on dependency parsing with data from Universal Dependencies (UD), we show that representations derived from morphological attributes deliver important parsing performance improvements over standard word form embeddings when trained on the same datasets. We also show that the new morphosyntactic complexity measure is predictive of the gains provided by using morphological attributes over plain forms on parsing scores, making it a tool to distinguish languages using morphology as a syntactic marker from others.
Databáze: OpenAIRE