Zobrazeno 1 - 10
of 246
pro vyhledávání: '"LEE, JOONSEOK"'
Referring Image Segmentation is a comprehensive task to segment an object referred by a textual query from an image. In nature, the level of difficulty in this task is affected by the existence of similar objects and the complexity of the referring e
Externí odkaz:
http://arxiv.org/abs/2411.01494
Given a video with $T$ frames, frame sampling is a task to select $N \ll T$ frames, so as to maximize the performance of a fixed video classifier. Not just brute-force search, but most existing methods suffer from its vast search space of $\binom{T}{
Externí odkaz:
http://arxiv.org/abs/2409.05260
Publikováno v:
Forty-first International Conference on Machine Learning (ICML 2024)
The latent space of diffusion model mostly still remains unexplored, despite its great success and potential in the field of generative modeling. In fact, the latent space of existing diffusion models are entangled, with a distorted mapping from its
Externí odkaz:
http://arxiv.org/abs/2407.11451
Autor:
Kim, Jooeun, Kim, Jinri, Yeo, Kwangeun, Kim, Eungi, On, Kyoung-Woon, Mun, Jonghwan, Lee, Joonseok
Cold-start item recommendation is a long-standing challenge in recommendation systems. A common remedy is to use a content-based approach, but rich information from raw contents in various forms has not been fully utilized. In this paper, we propose
Externí odkaz:
http://arxiv.org/abs/2404.13808
Zero-shot learning offers an efficient solution for a machine learning model to treat unseen categories, avoiding exhaustive data collection. Zero-shot Sketch-based Image Retrieval (ZS-SBIR) simulates real-world scenarios where it is hard and costly
Externí odkaz:
http://arxiv.org/abs/2401.04860
Sequence prediction on temporal data requires the ability to understand compositional structures of multi-level semantics beyond individual and contextual properties. The task of temporal action segmentation, which aims at translating an untrimmed ac
Externí odkaz:
http://arxiv.org/abs/2312.04266
3D pose estimation is an invaluable task in computer vision with various practical applications. Especially, 3D pose estimation for multi-person from a monocular video (3DMPPE) is particularly challenging and is still largely uncharted, far from appl
Externí odkaz:
http://arxiv.org/abs/2309.08644
Autor:
Lee, Jiyoung, Kim, Seungho, Won, Seunghyun, Lee, Joonseok, Ghassemi, Marzyeh, Thorne, James, Choi, Jaeseok, Kwon, O-Kil, Choi, Edward
AI alignment refers to models acting towards human-intended goals, preferences, or ethical principles. Given that most large-scale deep learning models act as black boxes and cannot be manually controlled, analyzing the similarity between models and
Externí odkaz:
http://arxiv.org/abs/2308.01525
Autor:
Su, Kun, Li, Judith Yue, Huang, Qingqing, Kuzmin, Dima, Lee, Joonseok, Donahue, Chris, Sha, Fei, Jansen, Aren, Wang, Yu, Verzetti, Mauro, Denk, Timo I.
Video-to-music generation demands both a temporally localized high-quality listening experience and globally aligned video-acoustic signatures. While recent music generation models excel at the former through advanced audio codecs, the exploration of
Externí odkaz:
http://arxiv.org/abs/2305.06594
Autor:
Lee, Joonseok, Joe, Seongho, Park, Kyoungwon, Kim, Bogun, Kang, Hoyoung, Park, Jaeseon, Gwon, Youngjune
Publikováno v:
2022 26th International Conference on Pattern Recognition (ICPR), Montreal, QC, Canada, 2022, pp. 2935-2941
We propose a self-supervised learning method for long text documents based on contrastive learning. A key to our method is Shuffle and Divide (SaD), a simple text augmentation algorithm that sets up a pretext task required for contrastive updates to
Externí odkaz:
http://arxiv.org/abs/2304.09374