Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Khani, Aliasgahr"'
Existing benchmarks for large language models (LLMs) increasingly struggle to differentiate between top-performing models, underscoring the need for more challenging evaluation frameworks. We introduce MMLU-Pro+, an enhanced benchmark building upon M
Externí odkaz:
http://arxiv.org/abs/2409.02257