Zobrazeno 1 - 10
of 27
pro vyhledávání: '"Choe, Yo Joong"'
The linear representation hypothesis is the informal idea that semantic concepts are encoded as linear directions in the representation spaces of large language models (LLMs). Previous work has shown how to make this notion precise for representing b
Externí odkaz:
http://arxiv.org/abs/2406.01506
Autor:
Choe, Yo Joong, Ramdas, Aaditya
In anytime-valid sequential inference, it is known that any admissible procedure must be based on e-processes, which are composite generalizations of test martingales that quantify the accumulated evidence against a composite null hypothesis at any a
Externí odkaz:
http://arxiv.org/abs/2402.09698
Informally, the 'linear representation hypothesis' is the idea that high-level concepts are represented linearly as directions in some representation space. In this paper, we address two closely related questions: What does "linear representation" ac
Externí odkaz:
http://arxiv.org/abs/2311.03658
Abstaining classifiers have the option to abstain from making predictions on inputs that they are unsure about. These classifiers are becoming increasingly popular in high-stakes decision-making problems, as they can withhold uncertain predictions to
Externí odkaz:
http://arxiv.org/abs/2305.10564
Autor:
Choe, Yo Joong, Ramdas, Aaditya
Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the fo
Externí odkaz:
http://arxiv.org/abs/2110.00115
Invariant risk minimization (IRM) (Arjovsky et al., 2019) is a recently proposed framework designed for learning predictors that are invariant to spurious correlations across different training environments. Yet, despite its theoretical justification
Externí odkaz:
http://arxiv.org/abs/2004.05007
Natural language inference (NLI) and semantic textual similarity (STS) are key tasks in natural language understanding (NLU). Although several benchmark datasets for those tasks have been released in English and a few other languages, there are no pu
Externí odkaz:
http://arxiv.org/abs/2004.03289
Jejueo was classified as critically endangered by UNESCO in 2010. Although diverse efforts to revitalize it have been made, there have been few computational approaches. Motivated by this, we construct two new Jejueo datasets: Jejueo Interview Transc
Externí odkaz:
http://arxiv.org/abs/1911.12071
We present word2word, a publicly available dataset and an open-source Python package for cross-lingual word translations extracted from sentence-level parallel corpora. Our dataset provides top-k word translations in 3,564 (directed) language pairs a
Externí odkaz:
http://arxiv.org/abs/1911.12019
Grammatical error correction can be viewed as a low-resource sequence-to-sequence task, because publicly available parallel corpora are limited. To tackle this challenge, we first generate erroneous versions of large unannotated corpora using a reali
Externí odkaz:
http://arxiv.org/abs/1907.01256