Výsledky vyhledávání - "Jacobs, Cassandra L."

Report

Large-scale cloze evaluation reveals that token prediction tasks are neither lexically nor semantically aligned

Autor: Jacobs, Cassandra L., Grobol, Loïc, Tsang, Alvin

In this work we compare the generative behavior at the next token prediction level in several language models by comparing them to human productions in the cloze task. We find that while large models trained for longer are typically better estimators

Externí odkaz: http://arxiv.org/abs/2410.12057

Zobrazit plný text záznamu

Report

Incorporating Annotator Uncertainty into Representations of Discourse Relations

Autor: Cortez, S. Magalí López, Jacobs, Cassandra L.

Annotation of discourse relations is a known difficult task, especially for non-expert annotators. In this paper, we investigate novice annotators' uncertainty on the annotation of discourse relations on spoken conversational data. We find that dialo

Externí odkaz: http://arxiv.org/abs/2308.07179

Zobrazit plný text záznamu

Report

The distribution of discourse relations within and across turns in spontaneous conversation

Autor: Cortez, S. Magalí López, Jacobs, Cassandra L.

Time pressure and topic negotiation may impose constraints on how people leverage discourse relations (DRs) in spontaneous conversational contexts. In this work, we adapt a system of DRs for written language to spontaneous dialogue using crowdsourced

Externí odkaz: http://arxiv.org/abs/2307.03645

Zobrazit plný text záznamu

Report

Lost in Space Marking

Autor: Jacobs, Cassandra L., Pinter, Yuval

We look at a decision taken early in training a subword tokenizer, namely whether it should be the word-initial token that carries a special mark, or the word-final one. Based on surface-level considerations of efficiency and cohesion, as well as mor

Externí odkaz: http://arxiv.org/abs/2208.01561

Zobrazit plný text záznamu

Akademický článek

Lexico-syntactic constraints influence verbal working memory in sentence-like lists.

Autor: Schwering, Steven C.¹ schwering@wisc.edu, Jacobs, Cassandra L.², Montemayor, Janelle³, MacDonald, Maryellen C.¹

Publikováno v: Memory & Cognition. Nov2024, Vol. 52 Issue 8, p1852-1870. 19p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

Will it Unblend?

Autor: Pinter, Yuval, Jacobs, Cassandra L., Eisenstein, Jacob

Natural language processing systems often struggle with out-of-vocabulary (OOV) terms, which do not appear in training data. Blends, such as "innoventor", are one particularly challenging class of OOV, as they are formed by fusing together two or mor

Externí odkaz: http://arxiv.org/abs/2009.09123

Zobrazit plný text záznamu

Report

NYTWIT: A Dataset of Novel Words in the New York Times

Autor: Pinter, Yuval, Jacobs, Cassandra L., Bittker, Max

We present the New York Times Word Innovation Types dataset, or NYTWIT, a collection of over 2,500 novel English words published in the New York Times between November 2017 and March 2019, manually annotated for their class of novelty (such as lexica

Externí odkaz: http://arxiv.org/abs/2003.03444

Zobrazit plný text záznamu