Výsledky vyhledávání

Report

Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows

Autor: Lei, Fangyu, Chen, Jixuan, Ye, Yuxiao, Cao, Ruisheng, Shin, Dongchan, Su, Hongjin, Suo, Zhaoqing, Gao, Hongcheng, Hu, Wenjing, Yin, Pengcheng, Zhong, Victor, Xiong, Caiming, Sun, Ruoxi, Liu, Qian, Wang, Sida, Yu, Tao

Real-world enterprise text-to-SQL workflows often involve complex cloud or local data across various database systems, multiple SQL queries in various dialects, and diverse operations from data transformation to analytics. We introduce Spider 2.0, an

Externí odkaz: http://arxiv.org/abs/2411.07763

Zobrazit plný text záznamu

Report

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

Autor: Yang, John, Jimenez, Carlos E., Zhang, Alex L., Lieret, Kilian, Yang, Joyce, Wu, Xindi, Press, Ori, Muennighoff, Niklas, Synnaeve, Gabriel, Narasimhan, Karthik R., Yang, Diyi, Wang, Sida I., Press, Ofir

Autonomous systems for software engineering are now capable of fixing bugs and developing features. These systems are commonly evaluated on SWE-bench (Jimenez et al., 2024a), which assesses their ability to solve software issues from GitHub repositor

Externí odkaz: http://arxiv.org/abs/2410.03859

Zobrazit plný text záznamu

Report

Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?

Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based age

Externí odkaz: http://arxiv.org/abs/2407.10956

Zobrazit plný text záznamu

Report

Chemical control of self-assembly by the electrosolvation force

Autor: Wang, Sida, Walker-Gibbons, Rowan, Watkins, Bethany, Lin, Binghui, Krishnan, Madhavi

Self-assembly of matter in solution generally relies on attractive interactions that overcome entropy and drive the formation of higher-order molecular and particulate structures. Such interactions play key roles in a variety of contexts, e.g., cryst

Externí odkaz: http://arxiv.org/abs/2405.12099

Zobrazit plný text záznamu

Report

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

Autor: Jain, Naman, Han, King, Gu, Alex, Li, Wen-Ding, Yan, Fanjia, Zhang, Tianjun, Wang, Sida, Solar-Lezama, Armando, Sen, Koushik, Stoica, Ion

Large Language Models (LLMs) applied to code-related applications have emerged as a prominent field, attracting significant interest from both academia and industry. However, as new and improved LLMs are developed, existing evaluation benchmarks (e.g

Externí odkaz: http://arxiv.org/abs/2403.07974

Zobrazit plný text záznamu

Report

Evaluation of LLMs on Syntax-Aware Code Fill-in-the-Middle Tasks

Autor: Gong, Linyuan, Wang, Sida, Elhoushi, Mostafa, Cheung, Alvin

We introduce Syntax-Aware Fill-In-the-Middle (SAFIM), a new benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. This benchmark focuses on syntax-aware completions of program structures such as code blocks

Externí odkaz: http://arxiv.org/abs/2403.04814

Zobrazit plný text záznamu

Report

CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution

Autor: Gu, Alex, Rozière, Baptiste, Leather, Hugh, Solar-Lezama, Armando, Synnaeve, Gabriel, Wang, Sida I.

We present CRUXEval (Code Reasoning, Understanding, and eXecution Evaluation), a benchmark consisting of 800 Python functions (3-13 lines). Each function comes with an input-output pair, leading to two natural tasks: input prediction and output predi

Externí odkaz: http://arxiv.org/abs/2401.03065

Zobrazit plný text záznamu

Report

Accessing Higher Dimensions for Unsupervised Word Translation

Autor: Wang, Sida I.

The striking ability of unsupervised word translation has been demonstrated with the help of word vectors / pretraining; however, they require large amounts of data and usually fails if the data come from different domains. We propose coocmap, a meth

Externí odkaz: http://arxiv.org/abs/2305.14200

Zobrazit plný text záznamu

Report

Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing

Autor: Yan, Hao, Srivastava, Saurabh, Tai, Yintao, Wang, Sida I., Yih, Wen-tau, Yao, Ziyu

Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than the traditional one-shot semantic parsing. However, prior work has heavil

Externí odkaz: http://arxiv.org/abs/2305.08195

Zobrazit plný text záznamu

Report

LEVER: Learning to Verify Language-to-Code Generation with Execution

Autor: Ni, Ansong, Iyer, Srini, Radev, Dragomir, Stoyanov, Ves, Yih, Wen-tau, Wang, Sida I., Lin, Xi Victoria

The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuris

Externí odkaz: http://arxiv.org/abs/2302.08468

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání