Zobrazeno 1 - 10
of 27
pro vyhledávání: '"Wee, Dongyoon"'
Autor:
Yu, Seonghoon, Jung, Ilchae, Han, Byeongju, Kim, Taeoh, Kim, Yunho, Wee, Dongyoon, Son, Jeany
Referring image segmentation (RIS) requires dense vision-language interactions between visual pixels and textual words to segment objects based on a given description. However, commonly adapted dual-encoders in RIS, e.g., Swin transformer and BERT (u
Externí odkaz:
http://arxiv.org/abs/2408.15521
Video action detection (VAD) aims to detect actors and classify their actions in a video. We figure that VAD suffers more from classification rather than localization of actors. Hence, we analyze how prevailing methods form features for classificatio
Externí odkaz:
http://arxiv.org/abs/2407.19698
Autor:
Im, Woobin, Cha, Geonho, Lee, Sebin, Lee, Jumin, Seon, Juhyeong, Wee, Dongyoon, Yoon, Sung-Eui
This paper presents a novel approach for reconstructing dynamic radiance fields from monocular videos. We integrate kinematics with dynamic radiance fields, bridging the gap between the sparse nature of monocular videos and the real-world physics. Ou
Externí odkaz:
http://arxiv.org/abs/2407.14059
This paper introduces Motion-oriented Compositional Neural Radiance Fields (MoCo-NeRF), a framework designed to perform free-viewpoint rendering of monocular human videos via novel non-rigid motion modeling approach. In the context of dynamic clothed
Externí odkaz:
http://arxiv.org/abs/2407.11962
Summarizing a video requires a diverse understanding of the video, ranging from recognizing scenes to evaluating how much each frame is essential enough to be selected as a summary. Self-supervised learning (SSL) is acknowledged for its robustness an
Externí odkaz:
http://arxiv.org/abs/2306.01395
Temporal action detection aims to predict the time intervals and the classes of action instances in the video. Despite the promising performance, existing two-stream models exhibit slow inference speed due to their reliance on computationally expensi
Externí odkaz:
http://arxiv.org/abs/2303.17285
We introduce You Only Train Once (YOTO), a dynamic human generation framework, which performs free-viewpoint rendering of different human identities with distinct motions, via only one-time training from monocular videos. Most prior works for the tas
Externí odkaz:
http://arxiv.org/abs/2303.05835
Autor:
Monet, Nicolas, Wee, Dongyoon
This technical report introduces our solution, MEEV, proposed to the EgoBody Challenge at ECCV 2022. Captured from head-mounted devices, the dataset consists of human body shape and motion of interacting people. The EgoBody dataset has challenges suc
Externí odkaz:
http://arxiv.org/abs/2210.14165
Autor:
Kim, Taeoh, Kim, Jinhyung, Shim, Minho, Yun, Sangdoo, Kang, Myunggu, Wee, Dongyoon, Lee, Sangyoun
Data augmentation has recently emerged as an essential component of modern training recipes for visual recognition tasks. However, data augmentation for video recognition has been rarely explored despite its effectiveness. Few existing augmentation r
Externí odkaz:
http://arxiv.org/abs/2206.15015
To estimate the volume density and color of a 3D point in the multi-view image-based rendering, a common approach is to inspect the consensus existence among the given source image features, which is one of the informative cues for the estimation pro
Externí odkaz:
http://arxiv.org/abs/2206.04906