Výsledky vyhledávání

Report

MathGAP: Out-of-Distribution Evaluation on Problems with Arbitrarily Complex Proofs

Autor: Opedal, Andreas, Shirakami, Haruki, Schölkopf, Bernhard, Saparov, Abulhair, Sachan, Mrinmaya

Large language models (LLMs) can solve arithmetic word problems with high accuracy, but little is known about how well they generalize to problems that are more complex than the ones on which they have been trained. Empirical investigations of such q

Externí odkaz: http://arxiv.org/abs/2410.13502

Zobrazit plný text záznamu

Report

Nonreciprocal phonons in PT-symmetric antiferromagnet

Autor: Ren, Yafei, Saparov, Daniyar, Niu, Qian

Phonon nonreciprocity, indicating different transport properties along opposite directions, has been observed in experiments under a magnetic field. We show that nonreciprocal acoustic phonons can also exist without a magnetic field nor net magnetiza

Externí odkaz: http://arxiv.org/abs/2407.09361

Zobrazit plný text záznamu

Report

A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models

Autor: Rai, Daking, Zhou, Yilun, Feng, Shi, Saparov, Abulhair, Yao, Ziyu

Mechanistic interpretability (MI) is an emerging sub-field of interpretability that seeks to understand a neural network model by reverse-engineering its internal computations. Recently, MI has garnered significant attention for interpreting transfor

Externí odkaz: http://arxiv.org/abs/2407.02646

Zobrazit plný text záznamu

Report

LLMs Are Prone to Fallacies in Causal Inference

Autor: Joshi, Nitish, Saparov, Abulhair, Wang, Yixin, He, He

Recent work shows that causal facts can be effectively extracted from LLMs through prompting, facilitating the creation of causal graphs for causal inference tasks. However, it is unclear if this success is limited to explicitly-mentioned causal fact

Externí odkaz: http://arxiv.org/abs/2406.12158

Zobrazit plný text záznamu

Report

The impact of deep learning aid on the workload and interpretation accuracy of radiologists on chest computed tomography: a cross-over reader study

Interpretation of chest computed tomography (CT) is time-consuming. Previous studies have measured the time-saving effect of using a deep-learning-based aid (DLA) for CT interpretation. We evaluated the joint impact of a multi-pathology DLA on the ti

Externí odkaz: http://arxiv.org/abs/2406.08137

Zobrazit plný text záznamu

Report

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs). These challenges are organized into three different categories: scientific understanding of LLMs, development and deployment methods

Externí odkaz: http://arxiv.org/abs/2404.09932

Zobrazit plný text záznamu

Report

Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?

Autor: Opedal, Andreas, Stolfo, Alessandro, Shirakami, Haruki, Jiao, Ying, Cotterell, Ryan, Schölkopf, Bernhard, Saparov, Abulhair, Sachan, Mrinmaya

There is increasing interest in employing large language models (LLMs) as cognitive models. For such purposes, it is central to understand which properties of human cognition are well-modeled by LLMs, and which are not. In this work, we study the bia

Externí odkaz: http://arxiv.org/abs/2401.18070

Zobrazit plný text záznamu

Report

Evidence of Hot Carrier Extraction in Metal Halide Perovskite Solar Cells

Autor: Sourabh, S., Afshari, H., Whiteside, V. R., Eperon, G. E., Scheidt, R. A., Creason, T. D., Furis, M., Kirmani, A., Saparov, B., Luther, J. M., Beard, M. C., Sellers, I. R.

The presence of hot carriers is presented in the operational properties of an (FA,Cs)Pb(I, Br, Cl)3 solar cell at ambient temperatures and under practical solar concentration. At 100 K, clear evidence of hot carriers is observed in both the high ener

Externí odkaz: http://arxiv.org/abs/2311.08294

Zobrazit plný text záznamu

Report

Noisy Exemplars Make Large Language Models More Robust: A Domain-Agnostic Behavioral Analysis

Autor: Zheng, Hongyi, Saparov, Abulhair

Recent advances in prompt engineering enable large language models (LLMs) to solve multi-hop logical reasoning problems with impressive accuracy. However, there is little existing work investigating the robustness of LLMs with few-shot prompting tech

Externí odkaz: http://arxiv.org/abs/2311.00258

Zobrazit plný text záznamu

Report

Personas as a Way to Model Truthfulness in Language Models

Autor: Joshi, Nitish, Rando, Javier, Saparov, Abulhair, Kim, Najoung, He, He

Large language models (LLMs) are trained on vast amounts of text from the internet, which contains both factual and misleading information about the world. While unintuitive from a classic view of LMs, recent work has shown that the truth value of a

Externí odkaz: http://arxiv.org/abs/2310.18168

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání