Autor: |
Vieira, Renata, Quaresma, Paulo, Nunes, Maria das Graças Volpe, Mamede, Nuno J., Oliveira, Cláudia, Dias, Maria Carmelita, Antunes, Sandra, Nascimento, Maria Fernanda Bacelar, Casteleiro, João Miguel, Mendes, Amália, Pereira, Luísa, Sá, Tiago |
Zdroj: |
Computational Processing of the Portuguese Language; 2006, p238-243, 6p |
Abstrakt: |
This presentation focuses on an ongoing project which aims at the creation of a large lexical database of Portuguese multiword (MW) units, automatically extracted through the analysis of a balanced 50 million word corpus, statistically interpreted with lexical association measures and validated by hand. This database covers different types of MW units, like named entities, and lexical associations ranging from sets of favoured co-occurring forms to strongly lexicalized expressions. This new resource has a two-fold objective: to be an important research tool which supports the development of MW units typologies; to be of major help in developing and evaluating language processing tools able of dealing with MW expressions. [ABSTRACT FROM AUTHOR] |
Databáze: |
Supplemental Index |
Externí odkaz: |
|