Výsledky vyhledávání - "DELL, MELISSA"

Report

Autor: Dell, Melissa

Deep learning provides powerful methods to impute structured information from large-scale, unstructured text and image datasets. For example, economists might wish to detect the presence of economic activity in satellite images, or to measure the top

Externí odkaz: http://arxiv.org/abs/2407.15339

Zobrazit plný text záznamu

Report

News Deja Vu: Connecting Past and Present with Semantic Search

Autor: Franklin, Brevin, Silcock, Emily, Arora, Abhishek, Bryan, Tom, Dell, Melissa

Social scientists and the general public often analyze contemporary events by drawing parallels with the past, a process complicated by the vast, noisy, and unstructured nature of historical texts. For example, hundreds of millions of page scans from

Externí odkaz: http://arxiv.org/abs/2406.15593

Zobrazit plný text záznamu

Report

Contrastive Entity Coreference and Disambiguation for Historical Texts

Autor: Arora, Abhishek, Silcock, Emily, Heldring, Leander, Dell, Melissa

Massive-scale historical document collections are crucial for social science research. Despite increasing digitization, these documents typically lack unique cross-document identifiers for individuals mentioned within the texts, as well as individual

Externí odkaz: http://arxiv.org/abs/2406.15576

Zobrazit plný text záznamu

Report

Newswire: A Large-Scale Structured Database of a Century of Historical News

Autor: Silcock, Emily, Arora, Abhishek, D'Amico-Wong, Luca, Dell, Melissa

In the U.S. historically, local newspapers drew their content largely from newswires like the Associated Press. Historians argue that newswires played a pivotal role in creating a national identity and shared understanding of the world, but there is

Externí odkaz: http://arxiv.org/abs/2406.09490

Zobrazit plný text záznamu

Report

EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge

Autor: Bryan, Tom, Carlson, Jacob, Arora, Abhishek, Dell, Melissa

Billions of public domain documents remain trapped in hard copy or lack an accurate digitization. Modern natural language processing methods cannot be used to index, retrieve, and summarize their texts; conduct computational textual analyses; or extr

Externí odkaz: http://arxiv.org/abs/2310.10050

Zobrazit plný text záznamu

Report

LinkTransformer: A Unified Package for Record Linkage with Transformer Language Models

Autor: Arora, Abhishek, Dell, Melissa

Linking information across sources is fundamental to a variety of analyses in social science, business, and government. While large language models (LLMs) offer enormous promise for improving record linkage in noisy datasets, in many domains approxim

Externí odkaz: http://arxiv.org/abs/2309.00789

Zobrazit plný text záznamu

Report

American Stories: A Large-Scale Structured Text Dataset of Historical U.S. Newspapers

Autor: Dell, Melissa, Carlson, Jacob, Bryan, Tom, Silcock, Emily, Arora, Abhishek, Shen, Zejiang, D'Amico-Wong, Luca, Le, Quan, Querubin, Pablo, Heldring, Leander

Existing full text datasets of U.S. public domain newspapers do not recognize the often complex layouts of newspaper scans, and as a result the digitized content scrambles texts from articles, headlines, captions, advertisements, and other layout reg

Externí odkaz: http://arxiv.org/abs/2308.12477

Zobrazit plný text záznamu

Report

A Massive Scale Semantic Similarity Dataset of Historical English

Autor: Silcock, Emily, Dell, Melissa

A diversity of tasks use language models trained on semantic similarity data. While there are a variety of datasets that capture semantic similarity, they are either constructed from modern web data or are relatively small datasets created in the pas

Externí odkaz: http://arxiv.org/abs/2306.17810

Zobrazit plný text záznamu

Report

Quantifying Character Similarity with Vision Transformers

Autor: Yang, Xinmei, Arora, Abhishek, Jheng, Shao-Yu, Dell, Melissa

Record linkage is a bedrock of quantitative social science, as analyses often require linking data from multiple, noisy sources. Off-the-shelf string matching methods are widely used, as they are straightforward and cheap to implement and scale. Not

Externí odkaz: http://arxiv.org/abs/2305.14672

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání