Zobrazeno 1 - 10
of 1 639
pro vyhledávání: '"A. Tokmakov"'
We present REM, a framework for segmenting a wide range of concepts in video that can be described through natural language. Our method capitalizes on visual-language representations learned by video diffusion models on Internet-scale datasets. A key
Externí odkaz:
http://arxiv.org/abs/2410.23287
3D reconstruction from a single image is a long-standing problem in computer vision. Learning-based methods address its inherent scale ambiguity by leveraging increasingly large labeled and unlabeled datasets, to produce geometric priors capable of g
Externí odkaz:
http://arxiv.org/abs/2409.09896
Autor:
Liang, Junbang, Liu, Ruoshi, Ozguroglu, Ege, Sudhakar, Sruthi, Dave, Achal, Tokmakov, Pavel, Song, Shuran, Vondrick, Carl
A key challenge in manipulation is learning a policy that can robustly generalize to diverse visual environments. A promising mechanism for learning robust policies is to leverage video generative models, which are pretrained on large-scale datasets
Externí odkaz:
http://arxiv.org/abs/2406.16862
Autor:
Van Hoorick, Basile, Wu, Rundi, Ozguroglu, Ege, Sargent, Kyle, Liu, Ruoshi, Tokmakov, Pavel, Dave, Achal, Zheng, Changxi, Vondrick, Carl
Accurate reconstruction of complex dynamic scenes from just a single viewpoint continues to be a challenging task in computer vision. Current dynamic novel view synthesis methods typically require videos from many different camera viewpoints, necessi
Externí odkaz:
http://arxiv.org/abs/2405.14868
Autor:
Ozguroglu, Ege, Liu, Ruoshi, Surís, Dídac, Chen, Dian, Dave, Achal, Tokmakov, Pavel, Vondrick, Carl
We introduce pix2gestalt, a framework for zero-shot amodal segmentation, which learns to estimate the shape and appearance of whole objects that are only partially visible behind occlusions. By capitalizing on large-scale diffusion models and transfe
Externí odkaz:
http://arxiv.org/abs/2401.14398
Autor:
Kowal, Matthew, Dave, Achal, Ambrus, Rares, Gaidon, Adrien, Derpanis, Konstantinos G., Tokmakov, Pavel
This paper studies the problem of concept-based interpretability of transformer representations for videos. Concretely, we seek to explain the decision-making process of video transformers based on high-level, spatiotemporal concepts that are automat
Externí odkaz:
http://arxiv.org/abs/2401.10831
Autor:
Chu, Wen-Hsuan, Harley, Adam W., Tokmakov, Pavel, Dave, Achal, Guibas, Leonidas, Fragkiadaki, Katerina
Object tracking is central to robot perception and scene understanding. Tracking-by-detection has long been a dominant paradigm for object tracking of specific object categories. Recently, large-scale pre-trained models have shown promising advances
Externí odkaz:
http://arxiv.org/abs/2310.06992
Tracking objects with persistence in cluttered and dynamic environments remains a difficult challenge for computer vision systems. In this paper, we introduce $\textbf{TCOW}$, a new benchmark and model for visual tracking through heavy occlusion and
Externí odkaz:
http://arxiv.org/abs/2305.03052
Autor:
B. G. Alekyan, A. A. Gritskevich, N. G. Karapetyan, D. V. Ruchkin, A. A. Pechetov, P. V. Markov, B. N. Gurmikov, N. L. Irodova, L. G. Gyoletsyan, E. V. Tokmakov, A. V. Galstyan, A. Sh. Revishvili
Publikováno v:
Южно-Российский онкологический журнал, Vol 5, Iss 3, Pp 39-49 (2024)
Purpose of the study. To analyze the long-term results from various strategies of endovascular treatment for coronary artery disease (CAD) in patients concomitant with cancer.Patients and methods. 74 patients with both CAD disease and cancer were tre
Externí odkaz:
https://doaj.org/article/dba7c59e02574537a2503de4fed4f09e
Publikováno v:
CVPR 2023
Object discovery -- separating objects from the background without manual labels -- is a fundamental open challenge in computer vision. Previous methods struggle to go beyond clustering of low-level cues, whether handcrafted (e.g., color, texture) or
Externí odkaz:
http://arxiv.org/abs/2303.15555