Výsledky vyhledávání

Report

MoRA: LoRA Guided Multi-Modal Disease Diagnosis with Missing Modality

Autor: Shi, Zhiyi, Kim, Junsik, Li, Wanhua, Li, Yicong, Pfister, Hanspeter

Multi-modal pre-trained models efficiently extract and fuse features from different modalities with low memory requirements for fine-tuning. Despite this efficiency, their application in disease diagnosis is under-explored. A significant challenge is

Externí odkaz: http://arxiv.org/abs/2408.09064

Zobrazit plný text záznamu

Report

Research on Personalized Compression Algorithm for Pre-trained Models Based on Homomorphic Entropy Increase

Autor: Li, Yicong, Guo, Xing, Du, Haohua

In this article, we explore the challenges and evolution of two key technologies in the current field of AI: Vision Transformer model and Large Language Model (LLM). Vision Transformer captures global information by splitting images into small pieces

Externí odkaz: http://arxiv.org/abs/2408.08684

Zobrazit plný text záznamu

Report

VideoQA in the Era of LLMs: An Empirical Study

Autor: Xiao, Junbin, Huang, Nanxin, Qin, Hangyu, Li, Dongyang, Li, Yicong, Zhu, Fengbin, Tao, Zhulin, Yu, Jianxing, Lin, Liang, Chua, Tat-Seng, Yao, Angela

Video Large Language Models (Video-LLMs) are flourishing and has advanced many video-language tasks. As a golden testbed, Video Question Answering (VideoQA) plays pivotal role in Video-LLM developing. This work conducts a timely and comprehensive stu

Externí odkaz: http://arxiv.org/abs/2408.04223

Zobrazit plný text záznamu

Report

Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

Autor: Li, Yicong, Yang, Yu, Cao, Jiannong, Liu, Shuaiqi, Tang, Haoran, Xu, Guandong

Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic

Externí odkaz: http://arxiv.org/abs/2406.13201

Zobrazit plný text záznamu

Report

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Autor: Nguyen, Thong, Bin, Yi, Xiao, Junbin, Qu, Leigang, Li, Yicong, Wu, Jay Zhangjie, Nguyen, Cong-Duy, Ng, See-Kiong, Tuan, Luu Anh

Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in creating video

Externí odkaz: http://arxiv.org/abs/2406.05615

Zobrazit plný text záznamu

Report

Low-Resource Court Judgment Summarization for Common Law Systems

Autor: Liu, Shuaiqi, Cao, Jiannong, Li, Yicong, Yang, Ruosong, Wen, Zhiyuan

Common law courts need to refer to similar precedents' judgments to inform their current decisions. Generating high-quality summaries of court judgment documents can facilitate legal practitioners to efficiently review previous cases and assist the g

Externí odkaz: http://arxiv.org/abs/2403.04454

Zobrazit plný text záznamu

Report

Attention Is Not the Only Choice: Counterfactual Reasoning for Path-Based Explainable Recommendation

Autor: Li, Yicong, Sun, Xiangguo, Chen, Hongxu, Zhang, Sixiao, Yang, Yu, Xu, Guandong

Compared with only pursuing recommendation accuracy, the explainability of a recommendation model has drawn more attention in recent years. Many graph-based recommendations resort to informative paths with the attention mechanism for the explanation.

Externí odkaz: http://arxiv.org/abs/2401.05744

Zobrazit plný text záznamu

Report

Can I Trust Your Answer? Visually Grounded Video Question Answering

Autor: Xiao, Junbin, Yao, Angela, Li, Yicong, Chua, Tat Seng

We study visually grounded VideoQA in response to the emerging trends of utilizing pretraining techniques for video-language understanding. Specifically, by forcing vision-language models (VLMs) to answer questions and simultaneously provide visual e

Externí odkaz: http://arxiv.org/abs/2309.01327

Zobrazit plný text záznamu

Report

Redundancy-aware Transformer for Video Question Answering

Autor: Li, Yicong, Yang, Xun, Zhang, An, Feng, Chun, Wang, Xiang, Chua, Tat-Seng

This paper identifies two kinds of redundancy in the current VideoQA paradigm. Specifically, the current video encoders tend to holistically embed all video clues at different granularities in a hierarchical manner, which inevitably introduces \texti

Externí odkaz: http://arxiv.org/abs/2308.03267

Zobrazit plný text záznamu

Report

Discovering Spatio-Temporal Rationales for Video Question Answering

Autor: Li, Yicong, Xiao, Junbin, Feng, Chun, Wang, Xiang, Chua, Tat-Seng

This paper strives to solve complex video question answering (VideoQA) which features long video containing multiple objects and events at different time. To tackle the challenge, we highlight the importance of identifying question-critical temporal

Externí odkaz: http://arxiv.org/abs/2307.12058

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání