Automatic Matching and Expansion of Abbreviated Phrases without Context

Autor: Artaud, Chloé, Doucet, Antoine, Poulain D'Andecy, Vincent, Ogier, Jean-Marc
Přispěvatelé: Artaud, Chloé, Laboratoire Informatique, Image et Interaction - EA 2118 (L3I), Université de La Rochelle (ULR)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Zdroj: 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing2018)
19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing2018), Mar 2018, Hanoi, Vietnam
Popis: International audience; In many documents, like receipts or invoices, textual information is constrained by the space and organization of the document. The document information has no natural language context, and expressions are often abbreviated to respect the graphical layout, both at word level and phrase level. In order to analyze the semantic content of these types of document, we need to understand each phrase, and particularly each name of sold products. In this paper, we propose an approach to find the right expansion of abbreviations and acronyms, without context. First, we extract information about sold products from our receipts corpus and we analyze the different linguistic processes of abbreviation. Then, we retrieve a list of expanded names of products sold by the company that emitted receipts, and we propose an algorithm to pair extracted names of products with the corresponding expansions. We provide the research community with a unique document collection for abbreviation expansion.
Databáze: OpenAIRE