Estimating lexical availability of European Portuguese proverbs

Autor: Jorge Baptista, Sónia Reis
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Computational and Corpus-Based Phraseology ISBN: 9783319698045
Europhras
Popis: This paper relates data on lexical availability with data on textual frequency of proverbs in European Portuguese. Each data source should provide different perspectives on the use of proverbs in the language. This should allow an empirically well-motivated selection of proverbs aiming at the development of NLP resources, specifically for applications for learning Portuguese as a Foreign Language and for the diagnosis/therapy of speech impairments/disabilities. A large database (over 114,000 proverbs and their variants) was independently classified by two annotators, according to intuitively estimated lexical availability. Next, a random, stratified sample was selected and lexical availability was then confirmed with an online survey. Frequency data was gathered from two web browsers and a large-sized, publicly available, corpus of journalistic texts. Results from the survey, the web and the corpus by and large confirm the initial intuitive classification and a core of commonly used proverbs was defined info:eu-repo/semantics/publishedVersion
Databáze: OpenAIRE