Zobrazeno 1 - 10
of 109
pro vyhledávání: '"Plummer, Bryan A."'
Mobile app user interfaces (UIs) are rich with action, text, structure, and image content that can be utilized to learn generic UI representations for tasks like automating user commands, summarizing content, and evaluating the accessibility of user
Externí odkaz:
http://arxiv.org/abs/2406.07822
Online content is filled with logos, from ads and social media posts to website branding and product placements. Consequently, these logos are prevalent in the extensive web-scraped datasets used to pretrain Vision-Language Models, which are used for
Externí odkaz:
http://arxiv.org/abs/2406.01449
Autor:
Pham, Chau, Plummer, Bryan A.
Multi-Channel Imaging (MCI) contains an array of challenges for encoding useful feature representations not present in traditional images. For example, images from two different satellites may both contain RGB channels, but the remaining channels can
Externí odkaz:
http://arxiv.org/abs/2405.16419
Autor:
Tan, Reuben, Sun, Ximeng, Hu, Ping, Wang, Jui-hsien, Deilamsalehy, Hanieh, Plummer, Bryan A., Russell, Bryan, Saenko, Kate
Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to th
Externí odkaz:
http://arxiv.org/abs/2404.04346
Machine-Generated Text (MGT) detection aims to identify a piece of text as machine or human written. Prior work has primarily formulated MGT detection as a binary classification task over an entire document, with limited work exploring cases where on
Externí odkaz:
http://arxiv.org/abs/2402.11744
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP. However, the susceptibility of recent Large Vision-Language Models to these attacks remains understudied
Externí odkaz:
http://arxiv.org/abs/2402.00626
Autor:
Li, Nannan, Liu, Qing, Singh, Krishna Kumar, Wang, Yilin, Zhang, Jianming, Plummer, Bryan A., Lin, Zhe
Human image editing includes tasks like changing a person's pose, their clothing, or editing the image according to a text prompt. However, prior work often tackles these tasks separately, overlooking the benefit of mutual reinforcement from learning
Externí odkaz:
http://arxiv.org/abs/2312.14985
Large language models (LLMs) have emerged as powerful general-purpose interfaces for many machine learning problems. Recent work has adapted LLMs to generative visual tasks like image captioning, visual question answering, and visual chat, using a re
Externí odkaz:
http://arxiv.org/abs/2312.01629
Autor:
Teterwak, Piotr, Nelson, Soren, Dryden, Nikoli, Bashkirova, Dina, Saenko, Kate, Plummer, Bryan A.
Neural parameter allocation search (NPAS) automates parameter sharing by obtaining weights for a network given an arbitrary, fixed parameter budget. Prior work has two major drawbacks we aim to address. First, there is a disconnect in the sharing pat
Externí odkaz:
http://arxiv.org/abs/2312.01274
Noisy labels can impair model performance, making the study of learning with noisy labels an important topic. Two conventional approaches are noise modeling and noise detection. However, these two methods are typically studied independently, and ther
Externí odkaz:
http://arxiv.org/abs/2312.00827