Výsledky vyhledávání

Report

Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data

Autor: Ma, Wufei, Li, Kai, Jiang, Zhongshi, Meshry, Moustafa, Liu, Qihao, Wang, Huiyu, Häne, Christian, Yuille, Alan

Recent video-text foundation models have demonstrated strong performance on a wide variety of downstream video understanding tasks. Can these video-text models genuinely understand the contents of natural videos? Standard video-text evaluations could

Externí odkaz: http://arxiv.org/abs/2407.13094

Zobrazit plný text záznamu

Report

ImageNet3D: Towards General-Purpose Object-Level 3D Understanding

Autor: Ma, Wufei, Zeng, Guanning, Zhang, Guofeng, Liu, Qihao, Zhang, Letian, Kortylewski, Adam, Liu, Yaoyao, Yuille, Alan

A vision model with general-purpose object-level 3D understanding should be capable of inferring both 2D (e.g., class name and bounding box) and 3D information (e.g., 3D location and 3D viewpoint) for arbitrary rigid objects in natural images. This i

Externí odkaz: http://arxiv.org/abs/2406.09613

Zobrazit plný text záznamu

Report

Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering

Autor: Wang, Xingrui, Ma, Wufei, Wang, Angtian, Chen, Shuo, Kortylewski, Adam, Yuille, Alan

For vision-language models (VLMs), understanding the dynamic properties of objects and their interactions within 3D scenes from video is crucial for effective reasoning. In this work, we introduce a video question answering dataset SuperCLEVR-Physics

Externí odkaz: http://arxiv.org/abs/2406.00622

Zobrazit plný text záznamu

Report

Uncertainty-Aware Deep Video Compression with Ensembles

Autor: Ma, Wufei, Li, Jiahao, Li, Bin, Lu, Yan

Deep learning-based video compression is a challenging task, and many previous state-of-the-art learning-based video codecs use optical flows to exploit the temporal correlation between successive frames and then compress the residual error. Although

Externí odkaz: http://arxiv.org/abs/2403.19158

Zobrazit plný text záznamu

Report

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

Autor: Wang, Xingrui, Ma, Wufei, Li, Zhuowan, Kortylewski, Adam, Yuille, Alan

Despite rapid progress in Visual question answering (VQA), existing datasets and models mainly focus on testing reasoning in 2D. However, it is important that VQA models also understand the 3D structure of visual scenes, for example to support tasks

Externí odkaz: http://arxiv.org/abs/2310.17914

Zobrazit plný text záznamu

Report

Animal3D: A Comprehensive Dataset of 3D Animal Pose and Shape

Accurately estimating the 3D pose and shape is an essential step towards understanding animal behavior, and can potentially benefit many downstream applications, such as wildlife conservation. However, research in this area is held back by the lack o

Externí odkaz: http://arxiv.org/abs/2308.11737

Zobrazit plný text záznamu

Report

Generating Images with 3D Annotations Using Diffusion Models

Autor: Ma, Wufei, Liu, Qihao, Wang, Jiahao, Wang, Angtian, Yuan, Xiaoding, Zhang, Yi, Xiao, Zihao, Zhang, Guofeng, Lu, Beijia, Duan, Ruxiao, Qi, Yongrui, Kortylewski, Adam, Liu, Yaoyao, Yuille, Alan

Diffusion models have emerged as a powerful generative method, capable of producing stunning photo-realistic images from natural language descriptions. However, these models lack explicit control over the 3D structure in the generated images. Consequ

Externí odkaz: http://arxiv.org/abs/2306.08103

Zobrazit plný text záznamu

Report

Neural Textured Deformable Meshes for Robust Analysis-by-Synthesis

Autor: Wang, Angtian, Ma, Wufei, Yuille, Alan, Kortylewski, Adam

Human vision demonstrates higher robustness than current AI algorithms under out-of-distribution scenarios. It has been conjectured such robustness benefits from performing analysis-by-synthesis. Our paper formulates triple vision tasks in a consiste

Externí odkaz: http://arxiv.org/abs/2306.00118

Zobrazit plný text záznamu

Report

Robust Category-Level 3D Pose Estimation from Synthetic Data

Autor: Yang, Jiahao, Ma, Wufei, Wang, Angtian, Yuan, Xiaoding, Yuille, Alan, Kortylewski, Adam

Obtaining accurate 3D object poses is vital for numerous computer vision applications, such as 3D reconstruction and scene understanding. However, annotating real-world objects is time-consuming and challenging. While synthetically generated training

Externí odkaz: http://arxiv.org/abs/2305.16124

Zobrazit plný text záznamu

Report

NOVUM: Neural Object Volumes for Robust Object Classification

Autor: Jesslen, Artur, Zhang, Guofeng, Wang, Angtian, Ma, Wufei, Yuille, Alan, Kortylewski, Adam

Discriminative models for object classification typically learn image-based representations that do not capture the compositional and 3D nature of objects. In this work, we show that explicitly integrating 3D compositional object representations into

Externí odkaz: http://arxiv.org/abs/2305.14668

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání