Accelerating word Embedding Generation with Fine-Grain Parallelism
Autor: | Leonardo Afonso Amorim, Wellington Santos Martins, Celso G. Camilo-Junior, Mateus F. Freitas, Weber Martins, Altino Dantas |
---|---|
Rok vydání: | 2019 |
Předmět: |
020203 distributed computing
Source code Theoretical computer science Word embedding Fine grain parallelism Computer science business.industry media_common.quotation_subject Feature vector Context (language use) 02 engineering and technology 010501 environmental sciences 01 natural sciences Software 0202 electrical engineering electronic engineering information engineering Word2vec Quality (business) business 0105 earth and related environmental sciences media_common |
Zdroj: | BRACIS |
Popis: | Word embedding has become a popular form of document representation since it captures complex semantic relationships between words. It creates low-dimensional feature vectors that indicate co-occurrence relationships between words in a given context. A recent successful application of word embedding is to assess the quality of fixes in Automated Software Repair. This application is highly computational demanding and motivated us to accelerate this technique so as to be able to work with software projects of thousand or million source code files. Thus in this work we present a fine-grain parallel implementation of a word embedding technique that scales linearly on a multi-GPU platform. Experiments with both standard and novel source code modified datasets show that we are able to generate embeddings 13x faster while keeping the accuracy of the results at the same level as those produced by standard word embedding programs. |
Databáze: | OpenAIRE |
Externí odkaz: |