Semantic Structure and Interpretability of Word Embeddings

Autor: Lutfi Kerem Senel, Veysel Yucesoy, Aykut Koc, Ihsan Utlu, Tolga Çukur
Přispěvatelé: Şenel, Lütfi Kerem, Utlu, İhsan, Yücesoy, Veysel, Koç, Aykut, Çukur, Tolga
Rok vydání: 2018
Předmět:
Zdroj: IEEE/ACM Transactions on Audio Speech and Language Processing
ISSN: 2329-9304
2329-9290
DOI: 10.1109/taslp.2018.2837384
Popis: Dense word embeddings, which encode semantic meanings of words to low dimensional vector spaces have become very popular in natural language processing (NLP) research due to their state-of-the-art performances in many NLP tasks. Word embeddings are substantially successful in capturing semantic relations among words, so a meaningful semantic structure must be present in the respective vector spaces. However, in many cases, this semantic structure is broadly and heterogeneously distributed across the embedding dimensions, which makes interpretation a big challenge. In this study, we propose a statistical method to uncover the latent semantic structure in the dense word embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that contains more than 6500 words semantically grouped under 110 categories. We further propose a method to quantify the interpretability of the word embeddings; the proposed method is a practical alternative to the classical word intrusion test that requires human intervention.
Comment: 11 Pages, 8 Figures, accepted by IEEE/ACM Transactions on Audio, Speech, and Language Processing
Databáze: OpenAIRE