Adversarial Training for Code Retrieval with Question-Description Relevance Regularization

Autor:	Jie Zhao, Huan Sun
Rok vydání:	2020
Předmět:	FOS: Computer and information sciences Computer science Computer Science - Artificial Intelligence 02 engineering and technology 010501 environmental sciences Machine learning computer.software_genre 01 natural sciences Regularization (mathematics) Computer Science - Information Retrieval Adversarial system 0202 electrical engineering electronic engineering information engineering Leverage (statistics) 0105 earth and related environmental sciences Computer Science - Computation and Language Computer Science - Programming Languages business.industry 020207 software engineering Artificial Intelligence (cs.AI) Artificial intelligence business computer Computation and Language (cs.CL) Natural language Information Retrieval (cs.IR) Programming Languages (cs.PL)
Zdroj:	EMNLP (Findings)
DOI:	10.48550/arxiv.2010.09803
Popis:	Code retrieval is a key task aiming to match natural and programming languages. In this work, we propose adversarial learning for code retrieval, that is regularized by question-description relevance. First, we adapt a simple adversarial learning technique to generate difficult code snippets given the input question, which can help the learning of code retrieval that faces bi-modal and data-scarce challenges. Second, we propose to leverage question-description relevance to regularize adversarial learning, such that a generated code snippet should contribute more to the code retrieval training loss, only if its paired natural language description is predicted to be less relevant to the user given question. Experiments on large-scale code retrieval datasets of two programming languages show that our adversarial learning method is able to improve the performance of state-of-the-art models. Moreover, using an additional duplicate question prediction model to regularize adversarial learning further improves the performance, and this is more effective than using the duplicated questions in strong multi-task learning baselines Comment: Accepted to Findings of EMNLP 2020. 11 pages, 2 figures
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9f3ca008b82fd4399ce8547dcec289cc Zobrazit plný text záznamu