Towards Automatic Construction of Filipino WordNet: Word Sense Induction and Synset Induction Using Sentence Embeddings

Autor: Velasco, Dan John, Alba, Axel, Pelagio, Trisha Gail, Ramirez, Bryce Anthony, Chua, Unisse, Samson, Briane Paul, Cruz, Jan Christian Blaise, Cheng, Charibeth
Rok vydání: 2022
Předmět:
Druh dokumentu: Working Paper
Popis: Wordnets are indispensable tools for various natural language processing applications. Unfortunately, wordnets get outdated, and producing or updating wordnets can be slow and costly in terms of time and resources. This problem intensifies for low-resource languages. This study proposes a method for word sense induction and synset induction using only two linguistic resources, namely, an unlabeled corpus and a sentence embeddings-based language model. The resulting sense inventory and synonym sets can be used in automatically creating a wordnet. We applied this method on a corpus of Filipino text. The sense inventory and synsets were evaluated by matching them with the sense inventory of the machine translated Princeton WordNet, as well as comparing the synsets to the Filipino WordNet. This study empirically shows that the 30% of the induced word senses are valid and 40% of the induced synsets are valid in which 20% are novel synsets.
Comment: To appear in SEALP 2023. Formerly titled "Automatic WordNet Construction using Word Sense Induction through Sentence Embeddings"
Databáze: arXiv