Zobrazeno 1 - 10
of 22 165
pro vyhledávání: '"Saxon A"'
Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their models have g
Externí odkaz:
http://arxiv.org/abs/2407.16711
Autor:
Wu, Qiucheng, Zhao, Handong, Saxon, Michael, Bui, Trung, Wang, William Yang, Zhang, Yang, Chang, Shiyu
Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warr
Externí odkaz:
http://arxiv.org/abs/2407.01863
We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly lon
Externí odkaz:
http://arxiv.org/abs/2406.16851
Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluati
Externí odkaz:
http://arxiv.org/abs/2406.08656
Autor:
Saxon, Michael, Jahara, Fatima, Khoshnoodi, Mahsa, Lu, Yujie, Sharma, Aditya, Wang, William Yang
With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness -- the semantic coherence of generated images to the prompts they were conditioned on. A variety of T2I faithfulness metrics have b
Externí odkaz:
http://arxiv.org/abs/2404.04251
Autor:
Koehlenbeck, Sina M, Lee, Lance, Balcazar, Mario D, Chen, Ying, Esposito, Vincent, Hastings, Jerry, Hoffmann, Matthias C, Huang, Zhirong, Ng, May-Ling, Price, Saxon, Sato, Takahiro, Seaberg, Matthew, Sun, Yanwen, White, Adam, Zhang, Lin, Lantz, Brian, Zhu, Diling
The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based sy
Externí odkaz:
http://arxiv.org/abs/2403.14090
Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroL
Externí odkaz:
http://arxiv.org/abs/2403.11092
Publikováno v:
Child Welfare, 2024 Jan 01. 102(1), 1-24.
Externí odkaz:
https://www.jstor.org/stable/48783652
Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A
Externí odkaz:
http://arxiv.org/abs/2308.03188
Autor:
Saxon, Michael, Wang, William Yang
We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we c
Externí odkaz:
http://arxiv.org/abs/2306.01735