Zobrazeno 1 - 10
of 88
pro vyhledávání: '"Tighe, Joseph"'
Early action recognition is an important and challenging problem that enables the recognition of an action from a partially observed video stream where the activity is potentially unfinished or even not started. In this work, we propose a novel model
Externí odkaz:
http://arxiv.org/abs/2312.06598
Autor:
Duan, Haodong, Xu, Mingze, Shuai, Bing, Modolo, Davide, Tu, Zhuowen, Tighe, Joseph, Bergamo, Alessandro
We present SkeleTR, a new framework for skeleton-based action recognition. In contrast to prior work, which focuses mainly on controlled environments, we target more general scenarios that typically involve a variable number of people and various for
Externí odkaz:
http://arxiv.org/abs/2309.11445
Autor:
Xu, Zhenlin, Zhu, Yi, Deng, Tiffany, Mittal, Abhay, Chen, Yanbei, Wang, Manchen, Favaro, Paolo, Tighe, Joseph, Modolo, Davide
This paper presents novel benchmarks for evaluating vision-language models (VLMs) in zero-shot recognition, focusing on granularity and specificity. Although VLMs excel in tasks like image captioning, they face challenges in open-world settings. Our
Externí odkaz:
http://arxiv.org/abs/2306.16048
Autor:
Chen, Yanbei, Wang, Manchen, Mittal, Abhay, Xu, Zhenlin, Favaro, Paolo, Tighe, Joseph, Modolo, Davide
Multi-dataset training provides a viable solution for exploiting heterogeneous large-scale datasets without extra annotation cost. In this work, we propose a scalable multi-dataset detector (ScaleDet) that can scale up its generalization across datas
Externí odkaz:
http://arxiv.org/abs/2306.04849
Autor:
Zhang, Qin, An, Dongsheng, Xiao, Tianjun, He, Tong, Tang, Qingming, Wu, Ying Nian, Tighe, Joseph, Xing, Yifan, Soatto, Stefano
In deep metric learning for visual recognition, the calibration of distance thresholds is crucial for achieving desired model performance in the true positive rates (TPR) or true negative rates (TNR). However, calibrating this threshold presents chal
Externí odkaz:
http://arxiv.org/abs/2305.12039
Autor:
Shuai, Bing, Bergamo, Alessandro, Buechler, Uta, Berneshawi, Andrew, Boden, Alyssa, Tighe, Joseph
This paper presents a new large scale multi-person tracking dataset -- \texttt{PersonPath22}, which is over an order of magnitude larger than currently available high quality multi-object tracking datasets such as MOT17, HiEve, and MOT20 datasets. Th
Externí odkaz:
http://arxiv.org/abs/2211.02175
In this paper, we provide an in-depth study of Stochastic Backpropagation (SBP) when training deep neural networks for standard image classification and object detection tasks. During backward propagation, SBP calculates the gradients by only using a
Externí odkaz:
http://arxiv.org/abs/2210.00129
We propose SCVRL, a novel contrastive-based framework for self-supervised learning for videos. Differently from previous contrast learning based methods that mostly focus on learning visual semantics (e.g., CVRL), SCVRL is capable of learning both se
Externí odkaz:
http://arxiv.org/abs/2205.11710
Most self-supervised video representation learning approaches focus on action recognition. In contrast, in this paper we focus on self-supervised video learning for movie understanding and propose a novel hierarchical self-supervised pretraining stra
Externí odkaz:
http://arxiv.org/abs/2204.03101
We propose a novel one-stage Transformer-based semantic and spatial refined transformer (SSRT) to solve the Human-Object Interaction detection task, which requires to localize humans and objects, and predicts their interactions. Differently from prev
Externí odkaz:
http://arxiv.org/abs/2204.00746