Popis: |
We present a new language pair agnostic approach to inducing bilingual vector spaces from non-parallel data without any other resource in a bootstrapping fashion. The paper systematically introduces and describes all key elements of the bootstrapping procedure: (1) starting point or seed lexicon, (2) the confidence estimation and selection of new dimensions of the space, and (3) convergence. We test the quality of the induced bilingual vector spaces, and analyze the influence of the different components of the bootstrapping approach in the task of bilingual lexicon extraction (BLE) for two language pairs. Results reveal that, contrary to conclusions from prior work, the seeding of the bootstrapping process has a heavy impact on the quality of the learned lexicons. We also show that our approach outperforms the best performing fully corpus-based BLE methods on these test sets. ispartof: pages:1613-1624 ispartof: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) pages:1613-1624 ispartof: 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP 2013) location:Seattle, WA, USA date:18 Oct - 21 Oct 2013 status: published |