Výsledky vyhledávání - "Tang, Xiangru"

Report

Data Preparation for Deep Learning based Code Smell Detection: A Systematic Literature Review

Autor: Zhang, Fengji, Zhang, Zexian, Keung, Jacky Wai, Tang, Xiangru, Yang, Zhen, Yu, Xiao, Hu, Wenhua

Code Smell Detection (CSD) plays a crucial role in improving software quality and maintainability. And Deep Learning (DL) techniques have emerged as a promising approach for CSD due to their superior performance. However, the effectiveness of DL-base

Externí odkaz: http://arxiv.org/abs/2406.19240

Zobrazit plný text záznamu

Report

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

Autor: Deng, Chunyuan, Zhao, Yilun, Heng, Yuzhao, Li, Yitong, Cao, Jiannan, Tang, Xiangru, Cohan, Arman

Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contami

Externí odkaz: http://arxiv.org/abs/2406.14644

Zobrazit plný text záznamu

Report

Step-Back Profiling: Distilling User History for Personalized Scientific Writing

Autor: Tang, Xiangru, Zhang, Xingyao, Shao, Yanjun, Wu, Jie, Zhao, Yilun, Cohan, Arman, Gong, Ming, Zhang, Dongmei, Gerstein, Mark

Large language models (LLMs) excel at a variety of natural language processing tasks, yet they struggle to generate personalized content for individuals, particularly in real-world scenarios like scientific writing. Addressing this challenge, we intr

Externí odkaz: http://arxiv.org/abs/2406.14275

Zobrazit plný text záznamu

Report

PRESTO: Progressive Pretraining Enhances Synthetic Chemistry Outcomes

Autor: Cao, He, Shao, Yanjun, Liu, Zhiyuan, Liu, Zijing, Tang, Xiangru, Yao, Yuan, Li, Yu

Multimodal Large Language Models (MLLMs) have seen growing adoption across various scientific disciplines. These advancements encourage the investigation of molecule-text modeling within synthetic chemistry, a field dedicated to designing and conduct

Externí odkaz: http://arxiv.org/abs/2406.13193

Zobrazit plný text záznamu

Report

Lessons from the Trenches on Reproducible Evaluation of Language Models

Effective evaluation of language models remains an open challenge in NLP. Researchers and engineers face methodological issues such as the sensitivity of models to evaluation setup, difficulty of proper comparisons across methods, and the lack of rep

Externí odkaz: http://arxiv.org/abs/2405.14782

Zobrazit plný text záznamu

Report

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise

Autor: Deng, Chunyuan, Tang, Xiangru, Zhao, Yilun, Wang, Hanming, Wang, Haoran, Zhou, Wangchunshu, Cohan, Arman, Gerstein, Mark

Recently, large language models (LLMs) have evolved into interactive agents, proficient in planning, tool use, and task execution across a wide variety of tasks. However, without specific agent tuning, open-source models like LLaMA currently struggle

Externí odkaz: http://arxiv.org/abs/2404.04285

Zobrazit plný text záznamu

Report

StarCoder 2 and The Stack v2: The Next Generation

Autor: Lozhkov, Anton, Li, Raymond, Allal, Loubna Ben, Cassano, Federico, Lamy-Poirier, Joel, Tazi, Nouamane, Tang, Ao, Pykhtar, Dmytro, Liu, Jiawei, Wei, Yuxiang, Liu, Tianyang, Tian, Max, Kocetkov, Denis, Zucker, Arthur, Belkada, Younes, Wang, Zijian, Liu, Qian, Abulkhanov, Dmitry, Paul, Indraneil, Li, Zhuang, Li, Wen-Ding, Risdal, Megan, Li, Jia, Zhu, Jian, Zhuo, Terry Yue, Zheltonozhskii, Evgenii, Dade, Nii Osae Osae, Yu, Wenhao, Krauß, Lucas, Jain, Naman, Su, Yixuan, He, Xuanli, Dey, Manan, Abati, Edoardo, Chai, Yekun, Muennighoff, Niklas, Tang, Xiangru, Oblokulov, Muhtasham, Akiki, Christopher, Marone, Marc, Mou, Chenghao, Mishra, Mayank, Gu, Alex, Hui, Binyuan, Dao, Tri, Zebaze, Armel, Dehaene, Olivier, Patry, Nicolas, Xu, Canwen, McAuley, Julian, Hu, Han, Scholak, Torsten, Paquet, Sebastien, Robinson, Jennifer, Anderson, Carolyn Jane, Chapados, Nicolas, Patwary, Mostofa, Tajbakhsh, Nima, Jernite, Yacine, Ferrandis, Carlos Muñoz, Zhang, Lingming, Hughes, Sean, Wolf, Thomas, Guha, Arjun, von Werra, Leandro, de Vries, Harm

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership with Software Heritage (SWH), we build The Stack v2 on top of the digita

Externí odkaz: http://arxiv.org/abs/2402.19173

Zobrazit plný text záznamu

Report

Data Interpreter: An LLM Agent For Data Science

Large Language Model (LLM)-based agents have demonstrated remarkable effectiveness. However, their performance can be compromised in data science scenarios that require real-time data adjustment, expertise in optimization due to complex dependencies

Externí odkaz: http://arxiv.org/abs/2402.18679

Zobrazit plný text záznamu

Report

A Survey of Generative AI for de novo Drug Design: New Frontiers in Molecule and Protein Generation

Autor: Tang, Xiangru, Dai, Howard, Knight, Elizabeth, Wu, Fang, Li, Yunyang, Li, Tianxiao, Gerstein, Mark

Artificial intelligence (AI)-driven methods can vastly improve the historically costly drug design process, with various generative models already in widespread use. Generative models for de novo drug design, in particular, focus on the creation of n

Externí odkaz: http://arxiv.org/abs/2402.08703

Zobrazit plný text záznamu

Report

ChatCell: Facilitating Single-Cell Analysis with Natural Language

Autor: Fang, Yin, Liu, Kangwei, Zhang, Ningyu, Deng, Xinle, Yang, Penghui, Chen, Zhuo, Tang, Xiangru, Gerstein, Mark, Fan, Xiaohui, Chen, Huajun

As Large Language Models (LLMs) rapidly evolve, their influence in science is becoming increasingly prominent. The emerging capabilities of LLMs in task generalization and free-form dialogue can significantly advance fields like chemistry and biology

Externí odkaz: http://arxiv.org/abs/2402.08303

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání