Výsledky vyhledávání - "Dalsgaard, Jacob Aarup"

Report

The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks

Autor: Møller, Anders Giovanni, Dalsgaard, Jacob Aarup, Pera, Arianna, Aiello, Luca Maria

In the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such

Externí odkaz: http://arxiv.org/abs/2304.13861

Zobrazit plný text záznamu

Report

The Danish Gigaword Project

Autor: Strømberg-Derczynski, Leon, Ciosici, Manuel R., Baglini, Rebekah, Christiansen, Morten H., Dalsgaard, Jacob Aarup, Fusaroli, Riccardo, Henrichsen, Peter Juel, Hvingelby, Rasmus, Kirkedal, Andreas, Kjeldsen, Alex Speed, Ladefoged, Claus, Nielsen, Finn Årup, Petersen, Malte Lau, Rystrøm, Jonathan Hvithamar, Varab, Daniel

Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion wo

Externí odkaz: http://arxiv.org/abs/2005.03521

Zobrazit plný text záznamu

Is a prompt and a few samples all you need? Using GPT-4 for data augmentation in low-resource classification tasks

Autor: Møller, Anders Giovanni, Dalsgaard, Jacob Aarup, Pera, Arianna, Aiello, Luca Maria

Obtaining and annotating data can be expensive and time-consuming, especially in complex, low-resource domains. We use GPT-4 and ChatGPT to augment small labeled datasets with synthetic data via simple prompts, in three different classification tasks

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::44e655f7ce6a0ccdd0b0ce964b53ae7d
http://arxiv.org/abs/2304.13861

Zobrazit plný text záznamu

The Danish Gigaword Corpus

Autor: Strømberg-Derczynski, Leon, Ciosici, Manuel Rafael, Christiansen, Morten H., Baglini, Rebekah Brita, Dalsgaard, Jacob Aarup, Fusaroli, Riccardo, Henrichsen, Peter Juel, Hvingelby, Rasmus, Kirkedal, Andreas, Kjeldsen, Alex Speed, Ladefoged, Claus, Nielsen, Finn Arup, Madsen, Jens, Petersen, Malte Lau, Rystrøm, Jonathan Hvithamar, Varab, Daniel

Publikováno v: Strømberg-Derczynski, L, Ciosici, M R, Christiansen, M H, Baglini, R B, Dalsgaard, J A, Fusaroli, R, Henrichsen, P J, Hvingelby, R, Kirkedal, A, Kjeldsen, A S, Ladefoged, C, Nielsen, F A, Madsen, J, Petersen, M L, Rystrøm, J H & Varab, D 2021, The Danish Gigaword Corpus . in Proceedings of the 23rd Nordic Conference on Computational Linguistics (NoDaLiDa) . Linköping University Electronic Press, pp. 413-421 . < https://www.aclweb.org/anthology/2021.nodalida-main.46 >

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=od______2751::40d0d6ef0a8ba840b1ac2f6cc3e8a1eb
https://curis.ku.dk/ws/files/305020079/2021.nodalida_main.46v2.pdf

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání