Zobrazeno 1 - 10
of 86
pro vyhledávání: '"Im, Woobin"'
Autor:
Im, Woobin, Cha, Geonho, Lee, Sebin, Lee, Jumin, Seon, Juhyeong, Wee, Dongyoon, Yoon, Sung-Eui
This paper presents a novel approach for reconstructing dynamic radiance fields from monocular videos. We integrate kinematics with dynamic radiance fields, bridging the gap between the sparse nature of monocular videos and the real-world physics. Ou
Externí odkaz:
http://arxiv.org/abs/2407.14059
Generating reliable pseudo masks from image-level labels is challenging in the weakly supervised semantic segmentation (WSSS) task due to the lack of spatial information. Prevalent class activation map (CAM)-based solutions are challenged to discrimi
Externí odkaz:
http://arxiv.org/abs/2406.15755
Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
Audio-visual segmentation (AVS) aims to segment sound sources in the video sequence, requiring a pixel-level understanding of audio-visual correspondence. As the Segment Anything Model (SAM) has strongly impacted extensive fields of dense prediction
Externí odkaz:
http://arxiv.org/abs/2406.06163
We present "SemCity," a 3D diffusion model for semantic scene generation in real-world outdoor environments. Most 3D diffusion models focus on generating a single object, synthetic indoor scenes, or synthetic outdoor scenes, while the generation of r
Externí odkaz:
http://arxiv.org/abs/2403.07773
In this paper, we learn a diffusion model to generate 3D data on a scene-scale. Specifically, our model crafts a 3D scene consisting of multiple objects, while recent diffusion research has focused on a single object. To realize our goal, we represen
Externí odkaz:
http://arxiv.org/abs/2301.00527
A training pipeline for optical flow CNNs consists of a pretraining stage on a synthetic dataset followed by a fine tuning stage on a target dataset. However, obtaining ground truth flows from a target video requires a tremendous effort. This paper p
Externí odkaz:
http://arxiv.org/abs/2207.10314
In computer vision, recovering spatial information by filling in masked regions, e.g., inpainting, has been widely investigated for its usability and wide applicability to other various applications: image inpainting, image extrapolation, and environ
Externí odkaz:
http://arxiv.org/abs/2106.13953
Publikováno v:
In Pattern Recognition Letters December 2023 176:215-222
Understanding the content of videos is one of the core techniques for developing various helpful applications in the real world, such as recognizing various human actions for surveillance systems or customer behavior analysis in an autonomous shop. H
Externí odkaz:
http://arxiv.org/abs/1907.05006
Up to now, only limited research has been conducted on cross-modal retrieval of suitable music for a specified video or vice versa. Moreover, much of the existing research relies on metadata such as keywords, tags, or associated description that must
Externí odkaz:
http://arxiv.org/abs/1704.06761