Multiword Expression Processing: A Survey

Autor: Gülşen Eryiğit, Lonneke van der Plas, Johanna Monti, Amalia Todirascu, Mathieu Constant, Mike Rosner, Carlos Ramisch
Přispěvatelé: Analyse et Traitement Informatique de la Langue Française (ATILF), Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Traitement Automatique du Langage Ecrit et Parlé (TALEP), Laboratoire d'Informatique et Systèmes (LIS), Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU), Linguistique, Langues et Parole (LILPA), Université de Strasbourg (UNISTRA), ANR-14-CERA-0001,PARSEME-FR,Analyse syntaxique et expressions polylexicales pour le fran?ais(2014), Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS)-Aix Marseille Université (AMU)-Université de Toulon (UTLN)-Centre National de la Recherche Scientifique (CNRS), Analyse et Traitement Informatique de la Langue Française ( ATILF ), Université de Lorraine ( UL ) -Centre National de la Recherche Scientifique ( CNRS ), Traitement Automatique du Langage Ecrit et Parlé ( TALEP ), Laboratoire d'Informatique et Systèmes ( LIS ), Aix Marseille Université ( AMU ) -Université de Toulon ( UTLN ) -Centre National de la Recherche Scientifique ( CNRS ) -Aix Marseille Université ( AMU ) -Université de Toulon ( UTLN ) -Centre National de la Recherche Scientifique ( CNRS ), Linguistique, Langues et Parole ( LILPA ), Université de Strasbourg ( UNISTRA ), LILPA, Fonctionnement Discursif et Traduction (LILPA) ( FDT )
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Linguistics and Language
Machine translation
Deep linguistic processing
Computer science
02 engineering and technology
computer.software_genre
Multiword
processing
natural language processing

Language and Linguistics
Multiword expression
Artificial Intelligence
0202 electrical engineering
electronic engineering
information engineering

Use case
Orchestration (computing)
natural language processing
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
060201 languages & linguistics
Parsing
business.industry
06 humanities and the arts
Computer Science Applications
Identification (information)
Conceptual framework
0602 languages and literature
[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics
processing
020201 artificial intelligence & image processing
Artificial intelligence
business
computer
Multiword
Natural language processing
Zdroj: Computational Linguistics
Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2017, 43 (4), pp.837-892. ⟨10.1162/COLI_a_00302⟩
Computational Linguistics, 2017, 43 (4), pp.837-892. ⟨10.1162/COLI_a_00302⟩
Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), 2017, 43 (4), pp.837-892. 〈http://www.mitpressjournals.org/doi/abs/10.1162/COLI_a_00302〉. 〈10.1162/COLI_a_00302〉
ISSN: 0891-2017
1530-9312
DOI: 10.1162/COLI_a_00302⟩
Popis: International audience; Multiword expressions (MWEs) are a class of linguistic forms spanning conventional word boundaries that are both idiosyncratic and pervasive across different languages. The structure of linguistic processing that depends on the clear distinction between words and phrases has to be re-thought to accommodate MWEs. The issue of MWE handling is crucial for NLP applications, where it raises a number of challenges. The emergence of solutions in the absence of guiding principles motivates this survey, whose aim is not only to provide a focused review of MWE processing, but also to clarify the nature of interactions between MWE processing and downstream applications. We propose a conceptual framework within which challenges and research contributions can be positioned. It offers a shared understanding of what is meant by “MWE processing,” distinguishing the subtasks of MWE discovery and identification. It also elucidates the interactions between MWE processing and two use cases: Parsing and machine translation. Many of the approaches in the literature can be differentiated according to how MWE processing is timed with respect to underlying use cases. We discuss how such orchestration choices affect the scope of MWE-aware systems. For each of the two MWE processing subtasks and for each of the two use cases, we conclude on open issues and research perspectives.
Databáze: OpenAIRE