Predicting Word Embeddings Variability

Autor: Bénédicte Pierrejean, Ludovic Tanguy
Přispěvatelé: Cognition, Langues, Langage, Ergonomie ( CLLE-ERSS ), École pratique des hautes études ( EPHE ) -Université Toulouse - Jean Jaurès ( UT2J ) -Université Bordeaux Montaigne-Centre National de la Recherche Scientifique ( CNRS ), Cognition, Langues, Langage, Ergonomie (CLLE-ERSS), Université Bordeaux Montaigne-École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Centre National de la Recherche Scientifique (CNRS), Tanguy, Ludovic, École pratique des hautes études (EPHE), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université Toulouse - Jean Jaurès (UT2J)-Université Bordeaux Montaigne-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2018
Předmět:
Computer science
Stability (learning theory)
02 engineering and technology
Space (commercial competition)
050105 experimental psychology
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
0202 electrical engineering
electronic engineering
information engineering

0501 psychology and cognitive sciences
Word2vec
[ INFO.INFO-CL ] Computer Science [cs]/Computation and Language [cs.CL]
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
Reliability (statistics)
Hyperparameter
business.industry
05 social sciences
Computer Science::Computation and Language (Computational Linguistics and Natural Language and Speech Processing)
Pattern recognition
[SHS.LANGUE] Humanities and Social Sciences/Linguistics
Variation (linguistics)
[INFO.INFO-CL] Computer Science [cs]/Computation and Language [cs.CL]
[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics
Embedding
020201 artificial intelligence & image processing
Artificial intelligence
business
Computer Science::Formal Languages and Automata Theory
Word (computer architecture)
Zdroj: Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics
The seventh Joint Conference on Lexical and Computational Semantics
The seventh Joint Conference on Lexical and Computational Semantics, 2018, New Orleans, United States. Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics pp.154-159
The seventh Joint Conference on Lexical and Computational Semantics, Jun 2018, New Orleans, United States. pp.154-159
SEM@NAACL-HLT
Popis: International audience; Neural word embeddings models (such as those built with word2vec) are known to have stability problems: when retraining a model with the exact same hyperparameters, words neighborhoods may change. We propose a method to estimate such variation, based on the overlap of neighbors of a given word in two models trained with identical hyperparam-eters. We show that this inherent variation is not negligible, and that it does not affect every word in the same way. We examine the influence of several features that are intrinsic to a word, corpus or embedding model and provide a methodology that can predict the variability (and as such, reliability) of a word representation in a semantic vector space.
Databáze: OpenAIRE