Zobrazeno 1 - 10
of 75
pro vyhledávání: '"Jeni, László A."'
Autor:
Xie, Liuyue, Guo, Jiancong, Jeni, Laszlo A., Jia, Zhiheng, Li, Mingyang, Zhou, Yunwen, Guo, Chao
Recent extended reality headsets and field robots have adopted covers to protect the front-facing cameras from environmental hazards and falls. The surface irregularities on the cover can lead to optical aberrations like blurring and non-parametric d
Externí odkaz:
http://arxiv.org/abs/2411.06365
Autor:
Choudhury, Rohan, Zhu, Guanglei, Liu, Sihan, Niinuma, Koichiro, Kitani, Kris M., Jeni, László
Transformers are slow to train on videos due to extremely large numbers of input tokens, even though many video tokens are repeated over time. Existing methods to remove such uninformative tokens either have significant overhead, negating any speedup
Externí odkaz:
http://arxiv.org/abs/2411.05222
Reconstructing dynamic 3D scenes from 2D images and generating diverse views over time presents a significant challenge due to the inherent complexity and temporal dynamics involved. While recent advancements in neural implicit models and dynamic Gau
Externí odkaz:
http://arxiv.org/abs/2407.11309
Autor:
Yu, Heng, Wang, Chaoyang, Zhuang, Peiye, Menapace, Willi, Siarohin, Aliaksandr, Cao, Junli, Jeni, Laszlo A, Tulyakov, Sergey, Lee, Hsin-Ying
Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack phot
Externí odkaz:
http://arxiv.org/abs/2406.07472
Autor:
Milacski, Zoltán Á., Niinuma, Koichiro, Kawamura, Ryosuke, de la Torre, Fernando, Jeni, László A.
The connection between our 3D surroundings and the descriptive language that characterizes them would be well-suited for localizing and generating human motion in context but for one problem. The complexity introduced by multiple modalities makes cap
Externí odkaz:
http://arxiv.org/abs/2405.18438
The lifting of 3D structure and camera from 2D landmarks is at the cornerstone of the entire discipline of computer vision. Traditional methods have been confined to specific rigid objects, such as those in Perspective-n-Point (PnP) problems, but dee
Externí odkaz:
http://arxiv.org/abs/2312.11894
Capturing and re-animating the 3D structure of articulated objects present significant barriers. On one hand, methods requiring extensively calibrated multi-view setups are prohibitively complex and resource-intensive, limiting their practical applic
Externí odkaz:
http://arxiv.org/abs/2312.05664
We propose to answer zero-shot questions about videos by generating short procedural programs that derive a final answer from solving a sequence of visual subtasks. We present Procedural Video Querying (ProViQ), which uses a large language model to g
Externí odkaz:
http://arxiv.org/abs/2312.00937
Autor:
Gupta, Aarush, Cao, Junli, Wang, Chaoyang, Hu, Ju, Tulyakov, Sergey, Ren, Jian, Jeni, László A
Publikováno v:
NeurIPS 2023
Real-time novel-view image synthesis on mobile devices is prohibitive due to the limited computational power and storage. Using volumetric rendering methods, such as NeRF and its derivatives, on mobile devices is not suitable due to the high computat
Externí odkaz:
http://arxiv.org/abs/2310.16832
Existing volumetric methods for predicting 3D human pose estimation are accurate, but computationally expensive and optimized for single time-step prediction. We present TEMPO, an efficient multi-view pose estimation model that learns a robust spatio
Externí odkaz:
http://arxiv.org/abs/2309.07910