Zobrazeno 1 - 10
of 761
pro vyhledávání: '"A. Schulter"'
Leveraging multiple training datasets to scale up image segmentation models is beneficial for increasing robustness and semantic understanding. Individual datasets have well-defined ground truth with non-overlapping mask layouts and mutually exclusiv
Externí odkaz:
http://arxiv.org/abs/2409.09893
A powerful architecture for universal segmentation relies on transformers that encode multi-scale image features and decode object queries into mask predictions. With efficiency being a high priority for scaling such models, we observed that the stat
Externí odkaz:
http://arxiv.org/abs/2404.14657
Visual program synthesis is a promising approach to exploit the reasoning abilities of large language models for compositional computer vision tasks. Previous work has used few-shot prompting with frozen LLMs to synthesize visual programs. Training a
Externí odkaz:
http://arxiv.org/abs/2404.04627
Autor:
Liang, Mingfu, Su, Jong-Chyi, Schulter, Samuel, Garg, Sparsh, Zhao, Shiyu, Wu, Ying, Chandraker, Manmohan
Autonomous vehicle (AV) systems rely on robust perception models as a cornerstone of safety assurance. However, objects encountered on the road exhibit a long-tailed distribution, with rare or unseen categories posing challenges to a deployed percept
Externí odkaz:
http://arxiv.org/abs/2403.17373
Autor:
Zhao, Shiyu, Zhao, Long, G, Vijay Kumar B., Suh, Yumin, Metaxas, Dimitris N., Chandraker, Manmohan, Schulter, Samuel
The recent progress in language-based open-vocabulary object detection can be largely attributed to finding better ways of leveraging large-scale data with free-form text annotations. Training such models with a discriminative objective function has
Externí odkaz:
http://arxiv.org/abs/2401.00094
Visual question answering (VQA) has traditionally been treated as a single-step task where each question receives the same amount of effort, unlike natural human question-answering strategies. We explore a question decomposition strategy for VQA to o
Externí odkaz:
http://arxiv.org/abs/2310.17050
We aim to train a multi-task model such that users can adjust the desired compute budget and relative importance of task performances after deployment, without retraining. This enables optimizing performance for dynamically varying user needs, withou
Externí odkaz:
http://arxiv.org/abs/2308.11744
Autor:
Zhao, Shiyu, Schulter, Samuel, Zhao, Long, Zhang, Zhixing, G, Vijay Kumar B., Suh, Yumin, Chandraker, Manmohan, Metaxas, Dimitris N.
Recent studies have shown promising performance in open-vocabulary object detection (OVD) by utilizing pseudo labels (PLs) from pretrained vision and language models (VLMs). However, teacher-student self-training, a powerful and widely used paradigm
Externí odkaz:
http://arxiv.org/abs/2308.06412
Finetuning a large vision language model (VLM) on a target dataset after large scale pretraining is a dominant paradigm in visual question answering (VQA). Datasets for specialized tasks such as knowledge-based VQA or VQA in non natural-image domains
Externí odkaz:
http://arxiv.org/abs/2306.03932
Autor:
Min, Zhixiang, Zhuang, Bingbing, Schulter, Samuel, Liu, Buyu, Dunn, Enrique, Chandraker, Manmohan
Monocular 3D object localization in driving scenes is a crucial task, but challenging due to its ill-posed nature. Estimating 3D coordinates for each pixel on the object surface holds great potential as it provides dense 2D-3D geometric constraints f
Externí odkaz:
http://arxiv.org/abs/2305.17763