Zobrazeno 1 - 10
of 44
pro vyhledávání: '"Guo, Taian"'
Conventional multi-label recognition methods often focus on label confidence, frequently overlooking the pivotal role of partial order relations consistent with human preference. To resolve these issues, we introduce a novel method for multimodal lab
Externí odkaz:
http://arxiv.org/abs/2407.13221
Autor:
Huang, Jinsheng, Chen, Liang, Guo, Taian, Zeng, Fu, Zhao, Yusheng, Wu, Bohan, Yuan, Ye, Zhao, Haozhe, Guo, Zhihui, Zhang, Yichi, Yuan, Jingyang, Ju, Wei, Liu, Luchen, Liu, Tianyu, Chang, Baobao, Zhang, Ming
Large Multimodal Models (LMMs) exhibit impressive cross-modal understanding and reasoning abilities, often assessed through multiple-choice questions (MCQs) that include an image, a question, and several options. However, many benchmarks used for suc
Externí odkaz:
http://arxiv.org/abs/2407.00468
Autor:
Shu, Xiujun, Wen, Wei, Xu, Liangsheng, Qiao, Ruizhi, Guo, Taian, Li, Hanjun, Gan, Bei, Wang, Xiao, Sun, Xing
Video temporal character grouping locates appearing moments of major characters within a video according to their identities. To this end, recent works have evolved from unsupervised clustering to graph-based supervised clustering. However, graph met
Externí odkaz:
http://arxiv.org/abs/2308.14105
Temporal sentence grounding (TSG) aims to locate a specific moment from an untrimmed video with a given natural language query. Recently, weakly supervised methods still have a large performance gap compared to fully supervised ones, while the latter
Externí odkaz:
http://arxiv.org/abs/2308.04197
Image and language modeling is of crucial importance for vision-language pre-training (VLP), which aims to learn multi-modal representations from large-scale paired image-text data. However, we observe that most existing VLP methods focus on modeling
Externí odkaz:
http://arxiv.org/abs/2208.09374
This technical report presents the 3rd winning solution for MTVG, a new task introduced in the 4-th Person in Context (PIC) Challenge at ACM MM 2022. MTVG aims at localizing the temporal boundary of the step in an untrimmed video based on a textual d
Externí odkaz:
http://arxiv.org/abs/2208.06179
Real-world recognition system often encounters the challenge of unseen labels. To identify such unseen labels, multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding (e.g., GloVe). However,
Externí odkaz:
http://arxiv.org/abs/2207.01887
Video super-resolution (VSR) aims to utilize multiple low-resolution frames to generate a high-resolution prediction for each frame. In this process, inter- and intra-frames are the key sources for exploiting temporal and spatial information. However
Externí odkaz:
http://arxiv.org/abs/2007.11803
Akademický článek
Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.
K zobrazení výsledku je třeba se přihlásit.
Autor:
Zhou, Zhiqiang1 (AUTHOR), Xu, Xinyu2 (AUTHOR) 741756048@qq.com, Qu, Xilong3 (AUTHOR) quxilong@126.com, Li, Shun1 (AUTHOR)
Publikováno v:
International Journal of Engineering Business Management. 7/15/2020, Vol. 12, p1-13. 13p.