Easy-to-Deploy API Extraction by Multi-Level Feature Embedding and Transfer Learning
Autor: | Suyu Ma, Chunyang Chen, Cheng Chen, Guoqiang Li, Lizhen Qu, Zhenchang Xing |
---|---|
Rok vydání: | 2021 |
Předmět: |
Feature engineering
Word embedding Artificial neural network Application programming interface Computer science business.industry Feature extraction 020207 software engineering 02 engineering and technology computer.software_genre Automatic summarization Software 0202 electrical engineering electronic engineering information engineering Feature (machine learning) Artificial intelligence business computer Natural language processing |
Zdroj: | IEEE Transactions on Software Engineering. 47:2296-2311 |
ISSN: | 2326-3881 0098-5589 |
DOI: | 10.1109/tse.2019.2946830 |
Popis: | Application Programming Interfaces (APIs) have been widely discussed on social-technical platforms (e.g., Stack Overflow). Extracting API mentions from such informal software texts is the prerequisite for API-centric search and summarization of programming knowledge. Machine learning based API extraction has demonstrated superior performance than rule-based methods in informal software texts that lack consistent writing forms and annotations. However, machine learning based methods have a significant overhead in preparing training data and effective features. In this paper, we propose a multi-layer neural network based architecture for API extraction. Our architecture automatically learns character-, word- and sentence-level features from the input texts, thus removing the need for manual feature engineering and the dependence on advanced features (e.g., API gazetteers) beyond the input texts. We also propose to adopt transfer learning to adapt a source-library-trained model to a target-library, thus reducing the overhead of manual training-data labeling when the software text of multiple programming languages and libraries need to be processed. We conduct extensive experiments with six libraries of four programming languages which support diverse functionalities and have different API-naming and API-mention characteristics. Our experiments investigate the performance of our neural architecture for API extraction in informal software texts, the importance of different features, the effectiveness of transfer learning. Our results confirm not only the superior performance of our neural architecture than existing machine learning based methods for API extraction in informal software texts, but also the easy-to-deploy characteristic of our neural architecture. |
Databáze: | OpenAIRE |
Externí odkaz: |