Zobrazeno 1 - 10
of 848
pro vyhledávání: '"van Miltenburg, A."'
Autor:
van Miltenburg, Emiel
This short position paper provides a manually curated list of non-English image captioning datasets (as of May 2024). Through this list, we can observe the dearth of datasets in different languages: only 23 different languages are represented. With t
Externí odkaz:
http://arxiv.org/abs/2407.09495
This review gives an extensive overview of evaluation methods for task-oriented dialogue systems, paying special attention to practical applications of dialogue systems, for example for customer service. The review (1) provides an overview of the use
Externí odkaz:
http://arxiv.org/abs/2312.13871
Autor:
Belz, Anya, Thomson, Craig, Reiter, Ehud, Abercrombie, Gavin, Alonso-Moral, Jose M., Arvan, Mohammad, Braggaar, Anouck, Cieliebak, Mark, Clark, Elizabeth, van Deemter, Kees, Dinkar, Tanvi, Dušek, Ondřej, Eger, Steffen, Fang, Qixiang, Gao, Mingqi, Gatt, Albert, Gkatzia, Dimitra, González-Corbelle, Javier, Hovy, Dirk, Hürlimann, Manuela, Ito, Takumi, Kelleher, John D., Klubicka, Filip, Krahmer, Emiel, Lai, Huiyuan, van der Lee, Chris, Li, Yiru, Mahamood, Saad, Mieskes, Margot, van Miltenburg, Emiel, Mosteiro, Pablo, Nissim, Malvina, Parde, Natalie, Plátek, Ondřej, Rieser, Verena, Ruan, Jie, Tetreault, Joel, Toral, Antonio, Wan, Xiaojun, Wanner, Leo, Watson, Lewis, Yang, Diyi
We report our efforts in identifying a set of previous human evaluations in NLP that would be suitable for a coordinated study examining what makes human evaluations in NLP more/less reproducible. We present our results and findings, which include th
Externí odkaz:
http://arxiv.org/abs/2305.01633
Autor:
van Miltenburg, Emiel
This year the International Conference on Natural Language Generation (INLG) will feature an award for the paper with the best evaluation. The purpose of this award is to provide an incentive for NLG researchers to pay more attention to the way they
Externí odkaz:
http://arxiv.org/abs/2303.16742
This case study investigates the extent to which a language model (GPT-2) is able to capture native speakers' intuitions about implicit causality in a sentence completion task. We first reproduce earlier results (showing lower surprisal values for pr
Externí odkaz:
http://arxiv.org/abs/2212.04348
Autor:
van Miltenburg, Emiel, Clinciu, Miruna-Adriana, Dušek, Ondřej, Gkatzia, Dimitra, Inglis, Stephanie, Leppänen, Leo, Mahamood, Saad, Manning, Emma, Schoch, Stephanie, Thomson, Craig, Wen, Luou
We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overa
Externí odkaz:
http://arxiv.org/abs/2108.01182
Autor:
Mille, Simon, Dhole, Kaustubh D., Mahamood, Saad, Perez-Beltrachini, Laura, Gangal, Varun, Kale, Mihir, van Miltenburg, Emiel, Gehrmann, Sebastian
Machine learning approaches applied to NLP are often evaluated by summarizing their performance in a single number, for example accuracy. Since most test sets are constructed as an i.i.d. sample from the overall data, this approach overly simplifies
Externí odkaz:
http://arxiv.org/abs/2106.09069
Preregistration refers to the practice of specifying what you are going to do, and what you expect to find in your study, before carrying out the study. This practice is increasingly common in medicine and psychology, but is rarely discussed in NLP.
Externí odkaz:
http://arxiv.org/abs/2103.06944
Autor:
Gehrmann, Sebastian, Adewumi, Tosin, Aggarwal, Karmanya, Ammanamanchi, Pawan Sasanka, Anuoluwapo, Aremu, Bosselut, Antoine, Chandu, Khyathi Raghavi, Clinciu, Miruna, Das, Dipanjan, Dhole, Kaustubh D., Du, Wanyu, Durmus, Esin, Dušek, Ondřej, Emezue, Chris, Gangal, Varun, Garbacea, Cristina, Hashimoto, Tatsunori, Hou, Yufang, Jernite, Yacine, Jhamtani, Harsh, Ji, Yangfeng, Jolly, Shailza, Kale, Mihir, Kumar, Dhruv, Ladhak, Faisal, Madaan, Aman, Maddela, Mounica, Mahajan, Khyati, Mahamood, Saad, Majumder, Bodhisattwa Prasad, Martins, Pedro Henrique, McMillan-Major, Angelina, Mille, Simon, van Miltenburg, Emiel, Nadeem, Moin, Narayan, Shashi, Nikolaev, Vitaly, Niyongabo, Rubungo Andre, Osei, Salomey, Parikh, Ankur, Perez-Beltrachini, Laura, Rao, Niranjan Ramesh, Raunak, Vikas, Rodriguez, Juan Diego, Santhanam, Sashank, Sedoc, João, Sellam, Thibault, Shaikh, Samira, Shimorina, Anastasia, Cabezudo, Marco Antonio Sobrevilla, Strobelt, Hendrik, Subramani, Nishant, Xu, Wei, Yang, Diyi, Yerukola, Akhila, Zhou, Jiawei
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this m
Externí odkaz:
http://arxiv.org/abs/2102.01672
Autor:
van Miltenburg, Emiel
Automatic image description systems are commonly trained and evaluated using crowdsourced, human-generated image descriptions. The best-performing system is then determined using some measure of similarity to the reference data (BLEU, Meteor, CIDER,
Externí odkaz:
http://arxiv.org/abs/2006.08792