Zobrazeno 1 - 10
of 81 271
pro vyhledávání: '"Press AT"'
Autor:
Yang, John, Jimenez, Carlos E., Zhang, Alex L., Lieret, Kilian, Yang, Joyce, Wu, Xindi, Press, Ori, Muennighoff, Niklas, Synnaeve, Gabriel, Narasimhan, Karthik R., Yang, Diyi, Wang, Sida I., Press, Ofir
Autonomous systems for software engineering are now capable of fixing bugs and developing features. These systems are commonly evaluated on SWE-bench (Jimenez et al., 2024a), which assesses their ability to solve software issues from GitHub repositor
Externí odkaz:
http://arxiv.org/abs/2410.03859
Autor:
Abramovich, Talor, Udeshi, Meet, Shao, Minghao, Lieret, Kilian, Xi, Haoran, Milner, Kimberly, Jancheska, Sofija, Yang, John, Jimenez, Carlos E., Khorrami, Farshad, Krishnamurthy, Prashanth, Dolan-Gavitt, Brendan, Shafique, Muhammad, Narasimhan, Karthik, Karri, Ramesh, Press, Ofir
Although language model (LM) agents are demonstrating growing potential in many domains, their success in cybersecurity has been limited due to simplistic design and the lack of fundamental features for this domain. We present EnIGMA, an LM agent for
Externí odkaz:
http://arxiv.org/abs/2409.16165
Autor:
Press, Ori, Hochlehnert, Andreas, Prabhu, Ameya, Udandarao, Vishaal, Press, Ofir, Bethge, Matthias
Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research questi
Externí odkaz:
http://arxiv.org/abs/2407.12861
Autor:
Yoran, Ori, Amouyal, Samuel Joseph, Malaviya, Chaitanya, Bogin, Ben, Press, Ofir, Berant, Jonathan
Language agents, built on top of language models (LMs), are systems that can interact with complex environments, such as the open web. In this work, we examine whether such agents can perform realistic and time-consuming tasks on the web, e.g., monit
Externí odkaz:
http://arxiv.org/abs/2407.15711
Autor:
Tian, Minyang, Gao, Luyu, Zhang, Shizhuo Dylan, Chen, Xinan, Fan, Cunwei, Guo, Xuefei, Haas, Roland, Ji, Pan, Krongchon, Kittithat, Li, Yao, Liu, Shengyan, Luo, Di, Ma, Yutao, Tong, Hao, Trinh, Kha, Tian, Chenyu, Wang, Zihan, Wu, Bohao, Xiong, Yanyu, Yin, Shengzhu, Zhu, Minhui, Lieret, Kilian, Lu, Yanxin, Liu, Genglin, Du, Yufeng, Tao, Tianhua, Press, Ofir, Callan, Jamie, Huerta, Eliu, Peng, Hao
Since language models (LMs) now outperform average humans on many challenging tasks, it has become increasingly difficult to develop challenging, high-quality, and realistic evaluations. We address this issue by examining LMs' capabilities to generat
Externí odkaz:
http://arxiv.org/abs/2407.13168
Autor:
Press, William H.
We propose to capture relevant statistical associations in a dataset of categorical survey responses by a method, here termed MODP, that "learns" a probabilistic prediction function L. Specifically, L predicts each question's response based on the sa
Externí odkaz:
http://arxiv.org/abs/2406.05264
Entropy minimization (EM) is frequently used to increase the accuracy of classification models when they're faced with new data at test time. EM is a self-supervised learning method that optimizes classifiers to assign even higher probabilities to th
Externí odkaz:
http://arxiv.org/abs/2405.05012
Autor:
Yang, John, Jimenez, Carlos E., Wettig, Alexander, Lieret, Kilian, Yao, Shunyu, Narasimhan, Karthik, Press, Ofir
Language model (LM) agents are increasingly being used to automate complicated tasks in digital environments. Just as humans benefit from powerful software applications, such as integrated development environments, for complex tasks like software eng
Externí odkaz:
http://arxiv.org/abs/2405.15793
Autor:
Press, Meggan, Smith, James Henry
Publikováno v:
Reference Services Review, 2024, Vol. 52, Issue 3, pp. 385-396.
Externí odkaz:
http://www.emeraldinsight.com/doi/10.1108/RSR-02-2024-0005
Autor:
Jimenez, Carlos E., Yang, John, Wettig, Alexander, Yao, Shunyu, Pei, Kexin, Press, Ofir, Narasimhan, Karthik
Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging t
Externí odkaz:
http://arxiv.org/abs/2310.06770