Zobrazeno 1 - 10
of 27 509
pro vyhledávání: '"Lee, Yong In"'
Autor:
Park, Seong Hyeon, Choi, Gahyun, Kim, Eunjong, Park, Gwanyeol, Choi, Jisoo, Choi, Jiman, Chong, Yonuk, Lee, Yong-Ho, Hahn, Seungyong
Recent advances in quantum information processing with superconducting qubits have fueled a growing demand for scaling and miniaturizing circuit layouts. Despite significant progress, accurately predicting the Hamiltonian of complex circuits remains
Externí odkaz:
http://arxiv.org/abs/2410.24004
As latent diffusion models (LDMs) democratize image generation capabilities, there is a growing need to detect fake images. A good detector should focus on the generative models fingerprints while ignoring image properties such as semantic content, r
Externí odkaz:
http://arxiv.org/abs/2410.11835
Autor:
Cai, Mu, Tan, Reuben, Zhang, Jianrui, Zou, Bocheng, Zhang, Kai, Yao, Feng, Zhu, Fangrui, Gu, Jing, Zhong, Yiwu, Shang, Yuzhang, Dou, Yao, Park, Jaden, Gao, Jianfeng, Lee, Yong Jae, Yang, Jianwei
Understanding fine-grained temporal dynamics is crucial for multimodal video comprehension and generation. Due to the lack of fine-grained temporal annotations, existing video benchmarks mostly resemble static image benchmarks and are incompetent at
Externí odkaz:
http://arxiv.org/abs/2410.10818
There has been growing sentiment recently that modern large multimodal models (LMMs) have addressed most of the key challenges related to short video comprehension. As a result, both academia and industry are gradually shifting their attention toward
Externí odkaz:
http://arxiv.org/abs/2410.02763
Autor:
Li, Yuheng, Liu, Haotian, Cai, Mu, Li, Yijun, Shechtman, Eli, Lin, Zhe, Lee, Yong Jae, Singh, Krishna Kumar
In this paper, we introduce a model designed to improve the prediction of image-text alignment, targeting the challenge of compositional understanding in current visual-language models. Our approach focuses on generating high-quality training dataset
Externí odkaz:
http://arxiv.org/abs/2410.00905
This work introduces a model-free reinforcement learning framework that enables various modes of motion (quadruped, tripod, or biped) and diverse tasks for legged robot locomotion. We employ a motion-style reward based on a relaxed logarithmic barrie
Externí odkaz:
http://arxiv.org/abs/2409.15780
Autor:
Shang, Yuzhang, Xu, Bingxin, Kang, Weitai, Cai, Mu, Li, Yuheng, Wen, Zehao, Dong, Zhen, Keutzer, Kurt, Lee, Yong Jae, Yan, Yan
Advancements in Large Language Models (LLMs) inspire various strategies for integrating video modalities. A key approach is Video-LLMs, which incorporate an optimizable interface linking sophisticated video encoders to LLMs. However, due to computati
Externí odkaz:
http://arxiv.org/abs/2409.12963
3D perception in LiDAR point clouds is crucial for a self-driving vehicle to properly act in 3D environment. However, manually labeling point clouds is hard and costly. There has been a growing interest in self-supervised pre-training of 3D perceptio
Externí odkaz:
http://arxiv.org/abs/2409.06827
In the realm of vision models, the primary mode of representation is using pixels to rasterize the visual world. Yet this is not always the best or unique way to represent visual content, especially for designers and artists who depict the world usin
Externí odkaz:
http://arxiv.org/abs/2407.10972
Autor:
Li, Xiang, Mata, Cristina, Park, Jongwoo, Kahatapitiya, Kumara, Jang, Yoo Sung, Shang, Jinghuan, Ranasinghe, Kanchana, Burgert, Ryan, Cai, Mu, Lee, Yong Jae, Ryoo, Michael S.
LLMs with visual inputs, i.e., Vision Language Models (VLMs), have the capacity to process state information as visual-textual prompts and respond with policy decisions in text. We propose LLaRA: Large Language and Robotics Assistant, a framework tha
Externí odkaz:
http://arxiv.org/abs/2406.20095