Zobrazeno 1 - 10
of 367
pro vyhledávání: '"Li, Yicong"'
Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is
Externí odkaz:
http://arxiv.org/abs/2408.09064
In this article, we explore the challenges and evolution of two key technologies in the current field of AI: Vision Transformer model and Large Language Model (LLM). Vision Transformer captures global information by splitting images into small pieces
Externí odkaz:
http://arxiv.org/abs/2408.08684
Autor:
Xiao, Junbin, Huang, Nanxin, Qin, Hangyu, Li, Dongyang, Li, Yicong, Zhu, Fengbin, Tao, Zhulin, Yu, Jianxing, Lin, Liang, Chua, Tat-Seng, Yao, Angela
Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive stu
Externí odkaz:
http://arxiv.org/abs/2408.04223
Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic
Externí odkaz:
http://arxiv.org/abs/2406.13201
Autor:
Nguyen, Thong, Bin, Yi, Xiao, Junbin, Qu, Leigang, Li, Yicong, Wu, Jay Zhangjie, Nguyen, Cong-Duy, Ng, See-Kiong, Tuan, Luu Anh
Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video
Externí odkaz:
http://arxiv.org/abs/2406.05615
Common law courts need to refer to similar precedents' judgments to inform their current decisions. Generating high-quality summaries of court judgment documents can facilitate legal practitioners to efficiently review previous cases and assist the g
Externí odkaz:
http://arxiv.org/abs/2403.04454
Attention Is Not the Only Choice: Counterfactual Reasoning for Path-Based Explainable Recommendation
Compared with only pursuing recommendation accuracy, the explainability of a recommendation model has drawn more attention in recent years. Many graph-based recommendations resort to informative paths with the attention mechanism for the explanation.
Externí odkaz:
http://arxiv.org/abs/2401.05744
We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding. Specifically, by forcing vision-language models (VLMs) to answer questions and simultaneously provide visual e
Externí odkaz:
http://arxiv.org/abs/2309.01327
This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically, the current video encoders tend to holistically embed all video clues at different granularities in a hierarchical manner, which inevitably introduces \texti
Externí odkaz:
http://arxiv.org/abs/2308.03267
This paper strives to solve complex video question answering (VideoQA) which features long video containing multiple objects and events at different time. To tackle the challenge, we highlight the importance of identifying question-critical temporal
Externí odkaz:
http://arxiv.org/abs/2307.12058