Estimating lexical availability of European Portuguese proverbs
Autor: | Jorge Baptista, Sónia Reis |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
Data source
Web browser business.industry Computer science Foreign language Frequency data Frequency in corpus computer.software_genre language.human_language European Portuguese Proverbs Stratified sampling European Portuguese language Selection (linguistics) Artificial intelligence Portuguese business Lexical availability computer Natural language processing |
Zdroj: | Computational and Corpus-Based Phraseology ISBN: 9783319698045 Europhras |
Popis: | This paper relates data on lexical availability with data on textual frequency of proverbs in European Portuguese. Each data source should provide different perspectives on the use of proverbs in the language. This should allow an empirically well-motivated selection of proverbs aiming at the development of NLP resources, specifically for applications for learning Portuguese as a Foreign Language and for the diagnosis/therapy of speech impairments/disabilities. A large database (over 114,000 proverbs and their variants) was independently classified by two annotators, according to intuitively estimated lexical availability. Next, a random, stratified sample was selected and lexical availability was then confirmed with an online survey. Frequency data was gathered from two web browsers and a large-sized, publicly available, corpus of journalistic texts. Results from the survey, the web and the corpus by and large confirm the initial intuitive classification and a core of commonly used proverbs was defined info:eu-repo/semantics/publishedVersion |
Databáze: | OpenAIRE |
Externí odkaz: |