Výsledky vyhledávání

Report

Benchmarks as Microscopes: A Call for Model Metrology

Autor: Saxon, Michael, Holtzman, Ari, West, Peter, Wang, William Yang, Saphra, Naomi

Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their models have g

Externí odkaz: http://arxiv.org/abs/2407.16711

Zobrazit plný text záznamu

Report

VSP: Assessing the dual challenges of perception and reasoning in spatial planning tasks for VLMs

Autor: Wu, Qiucheng, Zhao, Handong, Saxon, Michael, Bui, Trung, Wang, William Yang, Zhang, Yang, Chang, Shiyu

Vision language models (VLMs) are an exciting emerging class of language models (LMs) that have merged classic LM capabilities with those of image processing systems. However, the ways that these capabilities combine are not always intuitive and warr

Externí odkaz: http://arxiv.org/abs/2407.01863

Zobrazit plný text záznamu

Report

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

Autor: Sharma, Aditya, Saxon, Michael, Wang, William Yang

We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly lon

Externí odkaz: http://arxiv.org/abs/2406.16851

Zobrazit plný text záznamu

Report

TC-Bench: Benchmarking Temporal Compositionality in Text-to-Video and Image-to-Video Generation

Autor: Feng, Weixi, Li, Jiachen, Saxon, Michael, Fu, Tsu-jui, Chen, Wenhu, Wang, William Yang

Video generation has many unique challenges beyond those of image generation. The temporal dimension introduces extensive possible variations across frames, over which consistency and continuity may be violated. In this study, we move beyond evaluati

Externí odkaz: http://arxiv.org/abs/2406.08656

Zobrazit plný text záznamu

Report

Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2)

Autor: Saxon, Michael, Jahara, Fatima, Khoshnoodi, Mahsa, Lu, Yujie, Sharma, Aditya, Wang, William Yang

With advances in the quality of text-to-image (T2I) models has come interest in benchmarking their prompt faithfulness -- the semantic coherence of generated images to the prompts they were conditioned on. A variety of T2I faithfulness metrics have b

Externí odkaz: http://arxiv.org/abs/2404.04251

Zobrazit plný text záznamu

Report

Dynamic motion trajectory control with nanoradian accuracy for multi-element X-ray optical systems via laser interferometry

Autor: Koehlenbeck, Sina M, Lee, Lance, Balcazar, Mario D, Chen, Ying, Esposito, Vincent, Hastings, Jerry, Hoffmann, Matthias C, Huang, Zhirong, Ng, May-Ling, Price, Saxon, Sato, Takahiro, Seaberg, Matthew, Sun, Yanwen, White, Adam, Zhang, Lin, Lantz, Brian, Zhu, Diling

The past decades have witnessed the development of new X-ray beam sources with brightness growing at a rate surpassing Moore's law. Current and upcoming diffraction limited and fully coherent X-ray beam sources, including multi-bend achromat based sy

Externí odkaz: http://arxiv.org/abs/2403.14090

Zobrazit plný text záznamu

Report

Lost in Translation? Translation Errors and Challenges for Fair Assessment of Text-to-Image Models on Multilingual Concepts

Autor: Saxon, Michael, Luo, Yiran, Levy, Sharon, Baral, Chitta, Yang, Yezhou, Wang, William Yang

Benchmarks of the multilingual capabilities of text-to-image (T2I) models compare generated images prompted in a test language to an expected image distribution over a concept set. One such benchmark, "Conceptual Coverage Across Languages" (CoCo-CroL

Externí odkaz: http://arxiv.org/abs/2403.11092

Zobrazit plný text záznamu

Akademický článek

Cultivating Well-Being : A Theoretical Framework of Well-being, Well-becoming, and Resiliency

Autor: Saxon, Verletta, Fox, Heather L., Williams, Cassandra, LeNoir, Terry

Publikováno v: Child Welfare, 2024 Jan 01. 102(1), 1-24.

Externí odkaz: https://www.jstor.org/stable/48783652

Zobrazit plný text záznamu

Report

Automatically Correcting Large Language Models: Surveying the landscape of diverse self-correction strategies

Autor: Pan, Liangming, Saxon, Michael, Xu, Wenda, Nathani, Deepak, Wang, Xinyi, Wang, William Yang

Large language models (LLMs) have demonstrated remarkable performance across a wide array of NLP tasks. However, their efficacy is undermined by undesired and inconsistent behaviors, including hallucination, unfaithful reasoning, and toxic content. A

Externí odkaz: http://arxiv.org/abs/2308.03188

Zobrazit plný text záznamu

Report

Multilingual Conceptual Coverage in Text-to-Image Models

Autor: Saxon, Michael, Wang, William Yang

We propose "Conceptual Coverage Across Languages" (CoCo-CroLa), a technique for benchmarking the degree to which any generative text-to-image system provides multilingual parity to its training language in terms of tangible nouns. For each model we c

Externí odkaz: http://arxiv.org/abs/2306.01735

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání