Reasoning with Portuguese Word Embeddings
Autor: | Cunha, Luís Filipe, Almeida, J. João, Simões, Alberto |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
DOI: | 10.4230/oasics.slate.2022.17 |
Popis: | Representing words with semantic distributions to create ML models is a widely used technique to perform Natural Language processing tasks. In this paper, we trained word embedding models with different types of Portuguese corpora, analyzing the influence of the models' parameterization, the corpora size, and domain. Then we validated each model with the classical evaluation methods available: four words analogies and measurement of the similarity of pairs of words. In addition to these methods, we proposed new alternative techniques to validate word embedding models, presenting new resources for this purpose. Finally, we discussed the obtained results and argued about some limitations of the word embedding models' evaluation methods. OASIcs, Vol. 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022), pages 17:1-17:14 |
Databáze: | OpenAIRE |
Externí odkaz: |