Výsledky vyhledávání

Report

Enterprise Benchmarks for Large Language Model Evaluation

Autor: Zhang, Bing, Takeuchi, Mikio, Kawahara, Ryo, Asthana, Shubhi, Hossain, Md. Maruf, Ren, Guang-Jie, Soule, Kate, Zhu, Yada

The advancement of large language models (LLMs) has led to a greater challenge of having a rigorous and systematic evaluation of complex tasks performed, especially in enterprise applications. Therefore, LLMs need to be able to benchmark enterprise d

Externí odkaz: http://arxiv.org/abs/2410.12857

Zobrazit plný text záznamu

Report

Large Language Model Routing with Benchmark Datasets

Autor: Shnitzer, Tal, Ou, Anthony, Silva, Mírian, Soule, Kate, Sun, Yuekai, Solomon, Justin, Thompson, Neil, Yurochkin, Mikhail

There is a rapidly growing number of open-source Large Language Models (LLMs) and benchmark datasets to compare them. While some models dominate these benchmarks, no single model typically achieves the best accuracy in all tasks and use cases. In thi

Externí odkaz: http://arxiv.org/abs/2309.15789

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání