Zobrazeno 1 - 10
of 54
pro vyhledávání: '"Kocmi, Tom"'
Autor:
Kocmi, Tom, Avramidis, Eleftherios, Bawden, Rachel, Bojar, Ondrej, Dvorkovich, Anton, Federmann, Christian, Fishel, Mark, Freitag, Markus, Gowda, Thamme, Grundkiewicz, Roman, Haddow, Barry, Karpinska, Marzena, Koehn, Philipp, Marie, Benjamin, Murray, Kenton, Nagata, Masaaki, Popel, Martin, Popovic, Maja, Shmatova, Mariya, Steingrímsson, Steinþór, Zouhar, Vilém
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics. The official ranking will be a human evaluation, which is superior to the automatic ranking and supersedes it. The purpose of this report is not to interpret any
Externí odkaz:
http://arxiv.org/abs/2407.19884
Annually, research teams spend large amounts of money to evaluate the quality of machine translation systems (WMT, inter alia). This is expensive because it requires a lot of expert human labor. The recently adopted annotation protocol, Error Span An
Externí odkaz:
http://arxiv.org/abs/2406.12419
Autor:
Kocmi, Tom, Zouhar, Vilém, Avramidis, Eleftherios, Grundkiewicz, Roman, Karpinska, Marzena, Popović, Maja, Sachan, Mrinmaya, Shmatova, Mariya
High-quality Machine Translation (MT) evaluation relies heavily on human judgments. Comprehensive error classification methods, such as Multidimensional Quality Metrics (MQM), are expensive as they are time-consuming and can only be done by experts,
Externí odkaz:
http://arxiv.org/abs/2406.11580
Autor:
Moghe, Nikita, Fazla, Arnisa, Amrhein, Chantal, Kocmi, Tom, Steedman, Mark, Birch, Alexandra, Sennrich, Rico, Guillou, Liane
Recent machine translation (MT) metrics calibrate their effectiveness by correlating with human judgement but without any insights about their behaviour across different error types. Challenge sets are used to probe specific dimensions of metric beha
Externí odkaz:
http://arxiv.org/abs/2401.16313
Ten years ago a single metric, BLEU, governed progress in machine translation research. For better or worse, there is no such consensus today, and consequently it is difficult for researchers to develop and retain the kinds of heuristic intuitions ab
Externí odkaz:
http://arxiv.org/abs/2401.06760
Autor:
Kocmi, Tom, Federmann, Christian
This paper introduces GEMBA-MQM, a GPT-based evaluation metric designed to detect translation quality errors, specifically for the quality estimation setting without the need for human reference translations. Based on the power of large language mode
Externí odkaz:
http://arxiv.org/abs/2310.13988
Reference-based metrics that operate at the sentence-level typically outperform quality estimation metrics, which have access only to the source and system output. This is unsurprising, since references resolve ambiguities that may be present in the
Externí odkaz:
http://arxiv.org/abs/2309.08832
Autor:
Tang, Tianyi, Lu, Hongyuan, Jiang, Yuchen Eleanor, Huang, Haoyang, Zhang, Dongdong, Zhao, Wayne Xin, Kocmi, Tom, Wei, Furu
Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements. The underlying reason is that one semantic meaning can actually b
Externí odkaz:
http://arxiv.org/abs/2305.15067
Generative large language models (LLMs), e.g., ChatGPT, have demonstrated remarkable proficiency across several NLP tasks, such as machine translation, text summarization. Recent research (Kocmi and Federmann, 2023) has shown that utilizing LLMs for
Externí odkaz:
http://arxiv.org/abs/2303.13809
Autor:
Kocmi, Tom, Federmann, Christian
We describe GEMBA, a GPT-based metric for assessment of translation quality, which works both with a reference translation and without. In our evaluation, we focus on zero-shot prompting, comparing four prompt variants in two modes, based on the avai
Externí odkaz:
http://arxiv.org/abs/2302.14520