Zobrazeno 1 - 10
of 20
pro vyhledávání: '"Silcock, Emily"'
Autor:
Sainz, Oscar, García-Ferrero, Iker, Jacovi, Alon, Campos, Jon Ander, Elazar, Yanai, Agirre, Eneko, Goldberg, Yoav, Chen, Wei-Lin, Chim, Jenny, Choshen, Leshem, D'Amico-Wong, Luca, Dell, Melissa, Fan, Run-Ze, Golchin, Shahriar, Li, Yucheng, Liu, Pengfei, Pahwa, Bhavish, Prabhu, Ameya, Sharma, Suryansh, Silcock, Emily, Solonko, Kateryna, Stap, David, Surdeanu, Mihai, Tseng, Yu-Min, Udandarao, Vishaal, Wang, Zengzhi, Xu, Ruijie, Yang, Jinglin
The 1st Workshop on Data Contamination (CONDA 2024) focuses on all relevant aspects of data contamination in natural language processing, where data contamination is understood as situations where evaluation data is included in pre-training corpora u
Externí odkaz:
http://arxiv.org/abs/2407.21530
Social scientists and the general public often analyze contemporary events by drawing parallels with the past, a process complicated by the vast, noisy, and unstructured nature of historical texts. For example, hundreds of millions of page scans from
Externí odkaz:
http://arxiv.org/abs/2406.15593
Massive-scale historical document collections are crucial for social science research. Despite increasing digitization, these documents typically lack unique cross-document identifiers for individuals mentioned within the texts, as well as individual
Externí odkaz:
http://arxiv.org/abs/2406.15576
In the U.S. historically, local newspapers drew their content largely from newswires like the Associated Press. Historians argue that newswires played a pivotal role in creating a national identity and shared understanding of the world, but there is
Externí odkaz:
http://arxiv.org/abs/2406.09490
Autor:
Dell, Melissa, Carlson, Jacob, Bryan, Tom, Silcock, Emily, Arora, Abhishek, Shen, Zejiang, D'Amico-Wong, Luca, Le, Quan, Querubin, Pablo, Heldring, Leander
Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout reg
Externí odkaz:
http://arxiv.org/abs/2308.12477
Autor:
Silcock, Emily, Dell, Melissa
A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the pas
Externí odkaz:
http://arxiv.org/abs/2306.17810
Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature
Externí odkaz:
http://arxiv.org/abs/2210.04261
Autor:
Silcock, Emily
Publikováno v:
Contemporary Arab Affairs, 2020 Jun 01. 13(2), 141-145.
Externí odkaz:
https://www.jstor.org/stable/48599850
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Silcock, Emily
Publikováno v:
Contemporary Arab Affairs; Jun2020, Vol. 13 Issue 2, p141-145, 5p