Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Ko, Dohwan"'
Large Language Models (LLMs) have demonstrated remarkable generalization and instruction-following capabilities with instruction tuning. The advancements in LLMs and instruction tuning have led to the development of Large Vision-Language Models (LVLM
Externí odkaz:
http://arxiv.org/abs/2411.00871
Large Language Models (LLMs) have shown remarkable performances on a wide range of natural language understanding and generation tasks. We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and c
Externí odkaz:
http://arxiv.org/abs/2310.15747
Video Question Answering (VideoQA) is a challenging task that entails complex multi-modal reasoning. In contrast to multiple-choice VideoQA which aims to predict the answer given several options, the goal of open-ended VideoQA is to answer questions
Externí odkaz:
http://arxiv.org/abs/2308.09363
Autor:
Ko, Dohwan, Choi, Joonmyung, Choi, Hyeong Kyu, On, Kyoung-Woon, Roh, Byungseok, Kim, Hyunwoo J.
Foundation models have shown outstanding performance and generalization capabilities across domains. Since most studies on foundation models mainly focus on the pretraining phase, a naive strategy to minimize a single task-specific loss is adopted fo
Externí odkaz:
http://arxiv.org/abs/2303.13009
Autor:
Ko, Dohwan, Choi, Joonmyung, Ko, Juyeon, Noh, Shinyeong, On, Kyoung-Woon, Kim, Eun-Sol, Kim, Hyunwoo J.
Learning generic joint representations for video and text by a supervised method requires a prohibitively substantial amount of manually annotated video datasets. As a practical alternative, a large-scale but uncurated and narrated video dataset, How
Externí odkaz:
http://arxiv.org/abs/2203.16784
Publikováno v:
In Information Sciences April 2023 623:206-219
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.