Zobrazeno 1 - 10
of 126
pro vyhledávání: '"Kim, Hyunwoo J"'
Latent Bayesian optimization (LBO) approaches have successfully adopted Bayesian optimization over a continuous latent space by employing an encoder-decoder architecture to address the challenge of optimization in a high dimensional or discrete input
Externí odkaz:
http://arxiv.org/abs/2411.05330
Rectified flow and reflow procedures have significantly advanced fast generation by progressively straightening ordinary differential equation (ODE) flows. They operate under the assumption that image and noise pairs, known as couplings, can be appro
Externí odkaz:
http://arxiv.org/abs/2411.00322
Large Language Models (LLMs) have demonstrated remarkable generalization and instruction-following capabilities with instruction tuning. The advancements in LLMs and instruction tuning have led to the development of Large Vision-Language Models (LVLM
Externí odkaz:
http://arxiv.org/abs/2411.00871
Autor:
Shen, Xiaoqian, Xiong, Yunyang, Zhao, Changsheng, Wu, Lemeng, Chen, Jun, Zhu, Chenchen, Liu, Zechun, Xiao, Fanyi, Varadarajan, Balakrishnan, Bordes, Florian, Liu, Zhuang, Xu, Hu, Kim, Hyunwoo J., Soran, Bilge, Krishnamoorthi, Raghuraman, Elhoseiny, Mohamed, Chandra, Vikas
Multimodal Large Language Models (MLLMs) have shown promising progress in understanding and analyzing video content. However, processing long videos remains a significant challenge constrained by LLM's context size. To address this limitation, we pro
Externí odkaz:
http://arxiv.org/abs/2410.17434
Knowledge graph-grounded dialog generation requires retrieving a dialog-relevant subgraph from the given knowledge base graph and integrating it with the dialog history. Previous works typically represent the graph using an external encoder, such as
Externí odkaz:
http://arxiv.org/abs/2410.09350
Autor:
Cioppa, Anthony, Giancola, Silvio, Somers, Vladimir, Joos, Victor, Magera, Floriane, Held, Jan, Ghasemzadeh, Seyed Abolfazl, Zhou, Xin, Seweryn, Karolina, Kowalczyk, Mateusz, Mróz, Zuzanna, Łukasik, Szymon, Hałoń, Michał, Mkhallati, Hassan, Deliège, Adrien, Hinojosa, Carlos, Sanchez, Karen, Mansourian, Amir M., Miralles, Pierre, Barnich, Olivier, De Vleeschouwer, Christophe, Alahi, Alexandre, Ghanem, Bernard, Van Droogenbroeck, Marc, Gorski, Adam, Clapés, Albert, Boiarov, Andrei, Afanasiev, Anton, Xarles, Artur, Scott, Atom, Lim, ByoungKwon, Yeung, Calvin, Gonzalez, Cristian, Rüfenacht, Dominic, Pacilio, Enzo, Deuser, Fabian, Altawijri, Faisal Sami, Cachón, Francisco, Kim, HanKyul, Wang, Haobo, Choe, Hyeonmin, Kim, Hyunwoo J, Kim, Il-Min, Kang, Jae-Mo, Tursunboev, Jamshid, Yang, Jian, Hong, Jihwan, Lee, Jimin, Zhang, Jing, Lee, Junseok, Zhang, Kexin, Habel, Konrad, Jiao, Licheng, Li, Linyi, Gutiérrez-Pérez, Marc, Ortega, Marcelo, Li, Menglong, Lopatto, Milosz, Kasatkin, Nikita, Nemtsev, Nikolay, Oswald, Norbert, Udin, Oleg, Kononov, Pavel, Geng, Pei, Alotaibi, Saad Ghazai, Kim, Sehyung, Ulasen, Sergei, Escalera, Sergio, Zhang, Shanshan, Yang, Shuyuan, Moon, Sunghwan, Moeslund, Thomas B., Shandyba, Vasyl, Golovkin, Vladimir, Dai, Wei, Chung, WonTaek, Liu, Xinyu, Zhu, Yongqiang, Kim, Youngseo, Li, Yuan, Yang, Yuting, Xiao, Yuxuan, Cheng, Zehua, Li, Zhihao
The SoccerNet 2024 challenges represent the fourth annual video understanding challenges organized by the SoccerNet team. These challenges aim to advance research across multiple themes in football, including broadcast video understanding, field unde
Externí odkaz:
http://arxiv.org/abs/2409.10587
Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi
Externí odkaz:
http://arxiv.org/abs/2407.19156
Recent studies on inverse problems have proposed posterior samplers that leverage the pre-trained diffusion models as powerful priors. These attempts have paved the way for using diffusion models in a wide range of inverse problems. However, the exis
Externí odkaz:
http://arxiv.org/abs/2407.16125
Open-vocabulary object detection (OVD) has been studied with Vision-Language Models (VLMs) to detect novel objects beyond the pre-trained categories. Previous approaches improve the generalization ability to expand the knowledge of the detector, usin
Externí odkaz:
http://arxiv.org/abs/2404.05687
Pre-trained vision-language models have shown impressive success on various computer vision tasks with their zero-shot generalizability. Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language
Externí odkaz:
http://arxiv.org/abs/2404.00851