Autor: |
Qihong Song, Haize Hu, Tebo Dai |
Jazyk: |
angličtina |
Rok vydání: |
2024 |
Předmět: |
|
Zdroj: |
Scientific Reports, Vol 14, Iss 1, Pp 1-19 (2024) |
Druh dokumentu: |
article |
ISSN: |
2045-2322 |
DOI: |
10.1038/s41598-024-64205-2 |
Popis: |
Abstract Code search aims to search for code snippets from large codebase that are semantically related to natural query statements. Deep learning is a valuable method for solving code search tasks in which the quality of training data directly impacts the performance of deep-learning models. However, most existing deep-learning models for code search research have overlooked the critical role of training data within batches, particularly hard negative samples, in optimizing model parameters. In this paper, we propose contrastive-metric learning CMCS for code search based on vector-level sampling and augmentation. Specifically, we propose a sampling method to obtain hard negative samples based on the K-means algorithm and a hardness-controllable sample augmentation method to obtain positive and hard negative samples based on vector-level augmentation techniques. We then design an optimization objective composed of metric learning and multimodal contrastive learning using obtained positive and hard negative samples. Extensive experiments were conducted on the large-scale dataset CodeSearchNet using seven advanced code search models. The results show that our proposed method significantly enhances the training efficiency and search performance of code search models, which is conducive to promoting software engineering development. |
Databáze: |
Directory of Open Access Journals |
Externí odkaz: |
|
Nepřihlášeným uživatelům se plný text nezobrazuje |
K zobrazení výsledku je třeba se přihlásit.
|