Zobrazeno 1 - 10
of 5 679
pro vyhledávání: '"data contamination"'
Autor:
Wu, Xiaobao, Pan, Liangming, Xie, Yuxi, Zhou, Ruiwen, Zhao, Shuai, Ma, Yubo, Du, Mingzhe, Mao, Rui, Luu, Anh Tuan, Wang, William Yang
Data contamination hinders fair LLM evaluation by introducing test data into newer models' training sets. Existing studies solve this challenge by updating benchmarks with newly collected data. However, they fail to guarantee contamination-free evalu
Externí odkaz:
http://arxiv.org/abs/2412.13670
Data contamination presents a critical barrier preventing widespread industrial adoption of advanced software engineering techniques that leverage code language models (CLMs). This phenomenon occurs when evaluation data inadvertently overlaps with th
Externí odkaz:
http://arxiv.org/abs/2411.10842
Visual anomaly detection targets to detect images that notably differ from normal pattern, and it has found extensive application in identifying defective parts within the manufacturing industry. These anomaly detection paradigms predominantly focus
Externí odkaz:
http://arxiv.org/abs/2411.09558
Autor:
Singh, Aaditya K., Kocyigit, Muhammed Yusuf, Poulton, Andrew, Esiobu, David, Lomeli, Maria, Szilvasy, Gergely, Hupkes, Dieuwke
Hampering the interpretation of benchmark scores, evaluation data contamination has become a growing concern in the evaluation of LLMs, and an active area of research studies its effects. While evaluation data contamination is easily understood intui
Externí odkaz:
http://arxiv.org/abs/2411.03923
The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and compar
Externí odkaz:
http://arxiv.org/abs/2411.03823
Image generation has shown remarkable results in generating high-fidelity realistic images, in particular with the advancement of diffusion-based models. However, the prevalence of AI-generated images may have side effects for the machine learning co
Externí odkaz:
http://arxiv.org/abs/2411.13852
Large language models (LLMs) have demonstrated great performance across various benchmarks, showing potential as general-purpose task solvers. However, as LLMs are typically trained on vast amounts of data, a significant concern in their evaluation i
Externí odkaz:
http://arxiv.org/abs/2410.18966
Large language models (LLMs) are widely used, but concerns about data contamination challenge the reliability of LLM evaluations. Existing contamination detection methods are often task-specific or require extra prerequisites, limiting practicality.
Externí odkaz:
http://arxiv.org/abs/2410.15005
As large language models achieve increasingly impressive results, questions arise about whether such performance is from generalizability or mere data memorization. Thus, numerous data contamination detection methods have been proposed. However, thes
Externí odkaz:
http://arxiv.org/abs/2409.09927
The leakage of benchmark data into the training data has emerged as a significant challenge for evaluating the capabilities of large language models (LLMs). In this work, we use experimental evidence and theoretical estimates to challenge the common
Externí odkaz:
http://arxiv.org/abs/2410.03249