Zobrazeno 1 - 10
of 127
pro vyhledávání: '"Kang, Wooyoung"'
In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities. Despite the importance of t
Externí odkaz:
http://arxiv.org/abs/2312.06742
Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and c
Externí odkaz:
http://arxiv.org/abs/2310.15747
Autor:
Kim, Taehoon, Ahn, Pyunghwan, Kim, Sangyun, Lee, Sihaeng, Marsden, Mark, Sala, Alessandra, Kim, Seung Hwan, Han, Bohyung, Lee, Kyoung Mu, Lee, Honglak, Bae, Kyounghoon, Wu, Xiangyu, Gao, Yi, Zhang, Hailiang, Yang, Yang, Guo, Weili, Lu, Jianfeng, Oh, Youngtaek, Cho, Jae Won, Kim, Dong-jin, Kweon, In So, Kim, Junmo, Kang, Wooyoung, Jhoo, Won Young, Roh, Byungseok, Mun, Jonghwan, Oh, Solgil, Ak, Kenan Emir, Lee, Gwang-Gook, Xu, Yan, Shen, Mingwei, Hwang, Kyomin, Shin, Wonsik, Lee, Kamin, Park, Wonhark, Lee, Dongkwan, Kwak, Nojun, Wang, Yujin, Wang, Yimu, Gu, Tiancheng, Lv, Xingchang, Sun, Mingmao
In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image capt
Externí odkaz:
http://arxiv.org/abs/2309.01961
Recent open-vocabulary detection methods aim to detect novel objects by distilling knowledge from vision-language models (VLMs) trained on a vast amount of image-text pairs. To improve the effectiveness of these methods, researchers have utilized dat
Externí odkaz:
http://arxiv.org/abs/2303.13040
Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model. However, since web-crawled data contains image-text pairs that a
Externí odkaz:
http://arxiv.org/abs/2212.13563
It is well known that most of the conventional video question answering (VideoQA) datasets consist of easy questions requiring simple reasoning processes. However, long videos inevitably contain complex and compositional semantic structures along wit
Externí odkaz:
http://arxiv.org/abs/2210.10300
Publikováno v:
In Journal of Affective Disorders 1 June 2023 330:16-23
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Kang, Wooyoung, Kang, Younbin, Kim, Aram, Tae, Woo-Suk, Kim, Kyeong Jin, Kim, Sin Gon, Ham, Byung-Joo, Han, Kyu-Man
Publikováno v:
In Psychiatry Research: Neuroimaging October 2022 326
Publikováno v:
In Psychiatry Research: Neuroimaging June 2022 322