Zobrazeno 1 - 10
of 95
pro vyhledávání: '"Xie, Yutao"'
Recently, large code generation models trained in a self-supervised manner on extensive unlabeled programming language data have achieved remarkable success. While these models acquire vast amounts of code knowledge, they perform poorly on code under
Externí odkaz:
http://arxiv.org/abs/2406.12326
The scaling law is becoming a fundamental law in many machine learning areas. That is, test error falls off with the power law when increasing training data, model size, and computing resource. However, whether this law is suitable for the task of co
Externí odkaz:
http://arxiv.org/abs/2402.12813
Autor:
Wei, Shufa, Xu, Xiaolong, Qi, Xianbiao, Yin, Xi, Xia, Jun, Ren, Jingyi, Tang, Peijun, Zhong, Yuxiang, Chen, Yihao, Ren, Xiaoqin, Liang, Yuxin, Huang, Liankai, Xie, Kai, Gui, Weikang, Tan, Wei, Sun, Shuanglong, Hu, Yongquan, Liu, Qinxian, Li, Nanjin, Dai, Chihao, Wang, Lihua, Liu, Xiaohui, Zhang, Lei, Xie, Yutao
Large Language Models (LLMs) have demonstrated exceptional capabilities across various natural language processing tasks. Yet, many of these advanced LLMs are tailored for broad, general-purpose applications. In this technical report, we introduce Ac
Externí odkaz:
http://arxiv.org/abs/2311.12315
Autor:
Ma, Yingwei, Yu, Yue, Li, Shanshan, Jiang, Yu, Guo, Yong, Zhang, Yuanliang, Xie, Yutao, Liao, Xiangke
Large language models (LLMs) have showcased remarkable prowess in code generation. However, automated code generation is still challenging since it requires a high-level semantic mapping between natural language requirements and codes. Most existing
Externí odkaz:
http://arxiv.org/abs/2310.10698
Current ASR systems are mainly trained and evaluated at the utterance level. Long range cross utterance context can be incorporated. A key task is to derive a suitable compact representation of the most relevant history contexts. In contrast to previ
Externí odkaz:
http://arxiv.org/abs/2306.13307
Publikováno v:
ACM Transactions on Software Engineering and Methodology 2023
Code writing is repetitive and predictable, inspiring us to develop various code intelligence techniques. This survey focuses on code search, that is, to retrieve code that matches a given query by effectively capturing the semantic similarity betwee
Externí odkaz:
http://arxiv.org/abs/2305.05959
Pre-trained code models have emerged as the state-of-the-art paradigm for code search tasks. The paradigm involves pre-training the model on search-irrelevant tasks such as masked language modeling, followed by the fine-tuning stage, which focuses on
Externí odkaz:
http://arxiv.org/abs/2305.04508
Pretrained language models have served as important backbones for natural language processing. Recently, in-domain pretraining has been shown to benefit various domain-specific downstream tasks. In the biomedical domain, natural language generation (
Externí odkaz:
http://arxiv.org/abs/2204.03905
Autor:
Yu, Sheng, Yuan, Zheng, Xia, Jun, Luo, Shengxuan, Ying, Huaiyuan, Zeng, Sihang, Ren, Jingyi, Yuan, Hongyi, Zhao, Zhengyun, Lin, Yucong, Lu, Keming, Wang, Jing, Xie, Yutao, Shum, Heung-Yeung
Biomedical knowledge graphs (BioMedKGs) are essential infrastructures for biomedical and healthcare big data and artificial intelligence (AI), facilitating natural language processing, model development, and data exchange. For decades, these knowledg
Externí odkaz:
http://arxiv.org/abs/2203.09975
Publikováno v:
In Signal Processing September 2024 222