Zobrazeno 1 - 10
of 187
pro vyhledávání: '"Sukthankar, Rahul"'
Autor:
Marcu, Alina, Pirvu, Mihai, Costea, Dragos, Haller, Emanuela, Slusanschi, Emil, Belbachir, Ahmed Nabil, Sukthankar, Rahul, Leordeanu, Marius
We present a method for learning multiple scene representations given a small labeled set, by exploiting the relationships between such representations in the form of a multi-task hypergraph. We also show how we can use the hypergraph to improve a po
Externí odkaz:
http://arxiv.org/abs/2308.07615
Autor:
Bazavan, Eduard Gabriel, Zanfir, Andrei, Zanfir, Mihai, Freeman, William T., Sukthankar, Rahul, Sminchisescu, Cristian
Advances in the state of the art for 3d human sensing are currently limited by the lack of visual datasets with 3d ground truth, including multiple people, in motion, operating in real-world environments, with complex illumination or occlusion, and p
Externí odkaz:
http://arxiv.org/abs/2112.12867
Vision Transformer (ViT) is emerging as the state-of-the-art architecture for image recognition. While recent studies suggest that ViTs are more robust than their convolutional counterparts, our experiments find that ViTs trained on ImageNet are over
Externí odkaz:
http://arxiv.org/abs/2111.10493
Autor:
Zanfir, Mihai, Zanfir, Andrei, Bazavan, Eduard Gabriel, Freeman, William T., Sukthankar, Rahul, Sminchisescu, Cristian
We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predict
Externí odkaz:
http://arxiv.org/abs/2106.09336
Autor:
Leordeanu, Marius, Pirvu, Mihai, Costea, Dragos, Marcu, Alina, Slusanschi, Emil, Sukthankar, Rahul
We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks. Each graph node is a scene interpretation layer, while each edge is a de
Externí odkaz:
http://arxiv.org/abs/2010.01086
Autor:
Zanfir, Andrei, Bazavan, Eduard Gabriel, Zanfir, Mihai, Freeman, William T., Sukthankar, Rahul, Sminchisescu, Cristian
We present deep neural network methodology to reconstruct the 3d pose and shape of people, given an input RGB image. We rely on a recently introduced, expressivefull body statistical 3d human model, GHUM, trained end-to-end, and learn to reconstruct
Externí odkaz:
http://arxiv.org/abs/2008.06910
Autor:
Albanie, Samuel, Liu, Yang, Nagrani, Arsha, Miech, Antoine, Coto, Ernesto, Laptev, Ivan, Sukthankar, Rahul, Ghanem, Bernard, Zisserman, Andrew, Gabeur, Valentin, Sun, Chen, Alahari, Karteek, Schmid, Cordelia, Chen, Shizhe, Zhao, Yida, Jin, Qin, Cui, Kaixu, Liu, Hui, Wang, Chen, Jiang, Yudong, Hao, Xiaoshuai
We present a new video understanding pentathlon challenge, an open competition held in conjunction with the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2020. The objective of the challenge was to explore and evaluate new methods
Externí odkaz:
http://arxiv.org/abs/2008.00744
Autor:
Stroud, Jonathan C., Lu, Zhichao, Sun, Chen, Deng, Jia, Sukthankar, Rahul, Schmid, Cordelia, Ross, David A.
Videos on the Internet are paired with pieces of text, such as titles and descriptions. This text typically describes the most important content in the video, such as the objects in the scene and the actions being performed. Based on this observation
Externí odkaz:
http://arxiv.org/abs/2007.14937
Autor:
Nagrani, Arsha, Sun, Chen, Ross, David, Sukthankar, Rahul, Schmid, Cordelia, Zisserman, Andrew
Is it possible to guess human action from dialogue alone? In this work we investigate the link between spoken words and actions in movies. We note that movie screenplays describe actions, as well as contain the speech of characters and hence can be u
Externí odkaz:
http://arxiv.org/abs/2003.13594
Autor:
Zanfir, Andrei, Bazavan, Eduard Gabriel, Xu, Hongyi, Freeman, Bill, Sukthankar, Rahul, Sminchisescu, Cristian
Publikováno v:
ECCV 2020
Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and thedifficulty to acquire training data for large-scale supervised learning in complex visual scenes. In this paper we present practic
Externí odkaz:
http://arxiv.org/abs/2003.10350