Zobrazeno 1 - 10
of 56
pro vyhledávání: '"Modolo, Davide"'
In this paper, we propose a novel concept of path consistency to learn robust object matching without using manual object identity supervision. Our key idea is that, to track a object through frames, we can obtain multiple different association resul
Externí odkaz:
http://arxiv.org/abs/2404.05136
Open-world detection poses significant challenges, as it requires the detection of any object using either object class labels or free-form texts. Existing related works often use large-scale manual annotated caption datasets for training, which are
Externí odkaz:
http://arxiv.org/abs/2404.05016
Early action recognition is an important and challenging problem that enables the recognition of an action from a partially observed video stream where the activity is potentially unfinished or even not started. In this work, we propose a novel model
Externí odkaz:
http://arxiv.org/abs/2312.06598
Autor:
Lemkhenter, Abdelhak, Wang, Manchen, Zancato, Luca, Swaminathan, Gurumurthy, Favaro, Paolo, Modolo, Davide
In this paper we introduce SemiGPC, a distribution-aware label refinement strategy based on Gaussian Processes where the predictions of the model are derived from the labels posterior distribution. Differently from other buffer-based semi-supervised
Externí odkaz:
http://arxiv.org/abs/2311.01646
We propose a new semi-supervised learning design for human pose estimation that revisits the popular dual-student framework and enhances it two ways. First, we introduce a denoising scheme to generate reliable pseudo-heatmaps as targets for learning
Externí odkaz:
http://arxiv.org/abs/2310.00099
Autor:
Duan, Haodong, Xu, Mingze, Shuai, Bing, Modolo, Davide, Tu, Zhuowen, Tighe, Joseph, Bergamo, Alessandro
We present SkeleTR, a new framework for skeleton-based action recognition. In contrast to prior work, which focuses mainly on controlled environments, we target more general scenarios that typically involve a variable number of people and various for
Externí odkaz:
http://arxiv.org/abs/2309.11445
Autor:
Xu, Zhenlin, Zhu, Yi, Deng, Tiffany, Mittal, Abhay, Chen, Yanbei, Wang, Manchen, Favaro, Paolo, Tighe, Joseph, Modolo, Davide
This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world settings. Our
Externí odkaz:
http://arxiv.org/abs/2306.16048
Autor:
Chen, Yanbei, Wang, Manchen, Mittal, Abhay, Xu, Zhenlin, Favaro, Paolo, Tighe, Joseph, Modolo, Davide
Multi-dataset training provides a viable solution for exploiting heterogeneous large-scale datasets without extra annotation cost. In this work, we propose a scalable multi-dataset detector (ScaleDet) that can scale up its generalization across datas
Externí odkaz:
http://arxiv.org/abs/2306.04849
Autor:
Zhang, Zhaoyang, Shen, Yantao, Shi, Kunyu, Cai, Zhaowei, Fang, Jun, Deng, Siqi, Yang, Hao, Modolo, Davide, Tu, Zhuowen, Soatto, Stefano
We present a vision-language model whose parameters are jointly trained on all tasks and fully shared among multiple heterogeneous tasks which may interfere with each other, resulting in a single model which we named Musketeer. The integration of kno
Externí odkaz:
http://arxiv.org/abs/2305.07019
Autor:
Cai, Zhaowei, Ravichandran, Avinash, Favaro, Paolo, Wang, Manchen, Modolo, Davide, Bhotika, Rahul, Tu, Zhuowen, Soatto, Stefano
We study semi-supervised learning (SSL) for vision transformers (ViT), an under-explored topic despite the wide adoption of the ViT architectures to different tasks. To tackle this problem, we propose a new SSL pipeline, consisting of first un/self-s
Externí odkaz:
http://arxiv.org/abs/2208.05688