Zobrazeno 1 - 10
of 88
pro vyhledávání: '"Uijlings, Jasper"'
Large-language models and large-vision models are increasingly capable of solving compositional reasoning tasks, as measured by breakthroughs in visual-question answering benchmarks. However, state-of-the-art solutions often involve careful construct
Externí odkaz:
http://arxiv.org/abs/2405.19773
Autor:
Castrejon, Lluis, Mensink, Thomas, Zhou, Howard, Ferrari, Vittorio, Araujo, Andre, Uijlings, Jasper
Combining Large Language Models (LLMs) with external specialized tools (LLMs+tools) is a recent paradigm to solve multimodal tasks such as Visual Question Answering (VQA). While this approach was demonstrated to work well when optimized and evaluated
Externí odkaz:
http://arxiv.org/abs/2404.05465
Autor:
Alazraki, Lisa, Castrejon, Lluis, Dehghani, Mostafa, Huot, Fantine, Uijlings, Jasper, Mensink, Thomas
This paper studies ensembling in the era of Large Vision-Language Models (LVLMs). Ensembling is a classical method to combine different models to get increased performance. In the recent work on Encyclopedic-VQA the authors examine a wide variety of
Externí odkaz:
http://arxiv.org/abs/2310.06641
Autor:
Mensink, Thomas, Uijlings, Jasper, Castrejon, Lluis, Goel, Arushi, Cadar, Felipe, Zhou, Howard, Sha, Fei, Araujo, André, Ferrari, Vittorio
We propose Encyclopedic-VQA, a large scale visual question answering (VQA) dataset featuring visual questions about detailed properties of fine-grained categories and instances. It contains 221k unique question+answer pairs each matched with (up to)
Externí odkaz:
http://arxiv.org/abs/2306.09224
Computer vision is driven by the many datasets available for training or evaluating novel methods. However, each dataset has a different set of class labels, visual definition of classes, images following a specific distribution, annotation protocols
Externí odkaz:
http://arxiv.org/abs/2206.04453
Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely o
Externí odkaz:
http://arxiv.org/abs/2204.01403
We address the problem of ensemble selection in transfer learning: Given a large pool of source models we want to select an ensemble of models which, after fine-tuning on the target training set, yields the best performance on the target test set. Si
Externí odkaz:
http://arxiv.org/abs/2111.13011
Transfer learning has become a popular method for leveraging pre-trained models in computer vision. However, without performing computationally expensive fine-tuning, it is difficult to quantify which pre-trained source models are suitable for a spec
Externí odkaz:
http://arxiv.org/abs/2111.12780
Transfer learning enables to re-use knowledge learned on a source task to help learning a target task. A simple form of transfer learning is common in current state-of-the-art computer vision models, i.e. pre-training a model for image classification
Externí odkaz:
http://arxiv.org/abs/2103.13318
This paper proposes to make a first step towards compatible and hence reusable network components. Rather than training networks for different tasks independently, we adapt the training process to produce network components that are compatible across
Externí odkaz:
http://arxiv.org/abs/2004.03898