Výsledky vyhledávání

Report

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

Autor: Yang, John, Jimenez, Carlos E., Zhang, Alex L., Lieret, Kilian, Yang, Joyce, Wu, Xindi, Press, Ori, Muennighoff, Niklas, Synnaeve, Gabriel, Narasimhan, Karthik R., Yang, Diyi, Wang, Sida I., Press, Ofir

Autonomous systems for software engineering are now capable of fixing bugs and developing features. These systems are commonly evaluated on SWE-bench (Jimenez et al., 2024a), which assesses their ability to solve software issues from GitHub repositor

Externí odkaz: http://arxiv.org/abs/2410.03859

Zobrazit plný text záznamu

Report

EnIGMA: Enhanced Interactive Generative Model Agent for CTF Challenges

Autor: Abramovich, Talor, Udeshi, Meet, Shao, Minghao, Lieret, Kilian, Xi, Haoran, Milner, Kimberly, Jancheska, Sofija, Yang, John, Jimenez, Carlos E., Khorrami, Farshad, Krishnamurthy, Prashanth, Dolan-Gavitt, Brendan, Shafique, Muhammad, Narasimhan, Karthik, Karri, Ramesh, Press, Ofir

Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for

Externí odkaz: http://arxiv.org/abs/2409.16165

Zobrazit plný text záznamu

Report

CiteME: Can Language Models Accurately Cite Scientific Claims?

Autor: Press, Ori, Hochlehnert, Andreas, Prabhu, Ameya, Udandarao, Vishaal, Press, Ofir, Bethge, Matthias

Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research questi

Externí odkaz: http://arxiv.org/abs/2407.12861

Zobrazit plný text záznamu

Report

AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?

Autor: Yoran, Ori, Amouyal, Samuel Joseph, Malaviya, Chaitanya, Bogin, Ben, Press, Ofir, Berant, Jonathan

Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web. In this work, we examine whether such agents can perform realistic and time-consuming tasks on the web, e.g., monit

Externí odkaz: http://arxiv.org/abs/2407.15711

Zobrazit plný text záznamu

Report

SciCode: A Research Coding Benchmark Curated by Scientists

Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generat

Externí odkaz: http://arxiv.org/abs/2407.13168

Zobrazit plný text záznamu

Report

'Minus-One' Data Prediction Generates Synthetic Census Data with Good Crosstabulation Fidelity

Autor: Press, William H.

We propose to capture relevant statistical associations in a dataset of categorical survey responses by a method, here termed MODP, that "learns" a probabilistic prediction function L. Specifically, L predicts each question's response based on the sa

Externí odkaz: http://arxiv.org/abs/2406.05264

Zobrazit plný text záznamu

Report

The Entropy Enigma: Success and Failure of Entropy Minimization

Autor: Press, Ori, Shwartz-Ziv, Ravid, LeCun, Yann, Bethge, Matthias

Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to th

Externí odkaz: http://arxiv.org/abs/2405.05012

Zobrazit plný text záznamu

Report

SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering

Autor: Yang, John, Jimenez, Carlos E., Wettig, Alexander, Lieret, Kilian, Yao, Shunyu, Narasimhan, Karthik, Press, Ofir

Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software eng

Externí odkaz: http://arxiv.org/abs/2405.15793

Zobrazit plný text záznamu

Akademický článek

Applying Universal Design for Learning to library peer professional development: a case study acknowledging adults as learners

Autor: Press, Meggan, Smith, James Henry

Publikováno v: Reference Services Review, 2024, Vol. 52, Issue 3, pp. 385-396.

Externí odkaz: http://www.emeraldinsight.com/doi/10.1108/RSR-02-2024-0005

Zobrazit plný text záznamu

Report

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Autor: Jimenez, Carlos E., Yang, John, Wettig, Alexander, Yao, Shunyu, Pei, Kexin, Press, Ofir, Narasimhan, Karthik

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging t

Externí odkaz: http://arxiv.org/abs/2310.06770

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání