Výsledky vyhledávání - "Paul, Debalina Ghosh"

Report

Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review

Autor: Paul, Debalina Ghosh, Zhu, Hong, Bayley, Ian

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to evaluate such

Externí odkaz: http://arxiv.org/abs/2406.12655

Zobrazit plný text záznamu

Report

ScenEval: A Benchmark for Scenario-Based Evaluation of Code Generation

Autor: Paul, Debalina Ghosh, Zhu, Hong, Bayley, Ian

In the scenario-based evaluation of machine learning models, a key problem is how to construct test datasets that represent various scenarios. The methodology proposed in this paper is to construct a benchmark and attach metadata to each test case. T

Externí odkaz: http://arxiv.org/abs/2406.12635

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání