Homonym Identification using BERT -- Using a Clustering Approach

Autor: Saha, Rohan
Rok vydání: 2021
Předmět:
Druh dokumentu: Working Paper
DOI: 10.13140/RG.2.2.29120.07681
Popis: Homonym identification is important for WSD that require coarse-grained partitions of senses. The goal of this project is to determine whether contextual information is sufficient for identifying a homonymous word. To capture the context, BERT embeddings are used as opposed to Word2Vec, which conflates senses into one vector. SemCor is leveraged to retrieve the embeddings. Various clustering algorithms are applied to the embeddings. Finally, the embeddings are visualized in a lower-dimensional space to understand the feasibility of the clustering process.
Databáze: arXiv