Výsledky vyhledávání

Report

Temporal Reasoning Transfer from Text to Video

Autor: Li, Lei, Liu, Yuanxin, Yao, Linli, Zhang, Peiyuan, An, Chenxin, Wang, Lean, Sun, Xu, Kong, Lingpeng, Liu, Qi

Video Large Language Models (Video LLMs) have shown promising capabilities in video comprehension, yet they struggle with tracking temporal changes and reasoning about temporal relationships. While previous research attributed this limitation to the

Externí odkaz: http://arxiv.org/abs/2410.06166

Zobrazit plný text záznamu

Report

DeCo: Decoupling Token Compression from Semantic Abstraction in Multimodal Large Language Models

Autor: Yao, Linli, Li, Lei, Ren, Shuhuai, Wang, Lean, Liu, Yuanxin, Sun, Xu, Hou, Lu

The visual projector, which bridges the vision and language modalities and facilitates cross-modal alignment, serves as a crucial component in MLLMs. However, measuring the effectiveness of projectors in vision-language alignment remains under-explor

Externí odkaz: http://arxiv.org/abs/2405.20985

Zobrazit plný text záznamu

Report

Towards Multimodal Video Paragraph Captioning Models Robust to Missing Modality

Autor: Chen, Sishuo, Li, Lei, Ren, Shuhuai, Gao, Rundong, Liu, Yuanxin, Bi, Xiaohan, Sun, Xu, Hou, Lu

Video paragraph captioning (VPC) involves generating detailed narratives for long videos, utilizing supportive modalities such as speech and event boundaries. However, the existing models are constrained by the assumption of constant availability of

Externí odkaz: http://arxiv.org/abs/2403.19221

Zobrazit plný text záznamu

Report

TempCompass: Do Video LLMs Really Understand Videos?

Autor: Liu, Yuanxin, Li, Shicheng, Liu, Yi, Wang, Yuxiang, Ren, Shuhuai, Li, Lei, Chen, Sishuo, Sun, Xu, Hou, Lu

Recently, there is a surge in interest surrounding video large language models (Video LLMs). However, existing benchmarks fail to provide a comprehensive feedback on the temporal perception ability of Video LLMs. On the one hand, most of them are una

Externí odkaz: http://arxiv.org/abs/2403.00476

Zobrazit plný text záznamu

Report

VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models

Autor: Li, Shicheng, Li, Lei, Ren, Shuhuai, Liu, Yuanxin, Liu, Yi, Gao, Rundong, Sun, Xu, Hou, Lu

The ability to perceive how objects change over time is a crucial ingredient in human intelligence. However, current benchmarks cannot faithfully reflect the temporal understanding abilities of video-language models (VidLMs) due to the existence of s

Externí odkaz: http://arxiv.org/abs/2311.17404

Zobrazit plný text záznamu

Report

FETV: A Benchmark for Fine-Grained Evaluation of Open-Domain Text-to-Video Generation

Autor: Liu, Yuanxin, Li, Lei, Ren, Shuhuai, Gao, Rundong, Li, Shicheng, Chen, Sishuo, Sun, Xu, Hou, Lu

Recently, open-domain text-to-video (T2V) generation models have made remarkable progress. However, the promising results are mainly shown by the qualitative cases of generated videos, while the quantitative evaluation of T2V models still faces two c

Externí odkaz: http://arxiv.org/abs/2311.01813

Zobrazit plný text záznamu

Akademický článek

Cell Identity and Spatial Distribution of PD‐1/PD‐L1 Blockade Responders.

Autor: Li, Xintong¹ (AUTHOR), Liu, Yuanxin¹ (AUTHOR), Gui, Jun² (AUTHOR), Gan, Lu³ (AUTHOR) ganlu@wchscu.cn, Xue, Jianxin⁴ (AUTHOR) jianxin-xue@wchscu.edu.cn

Publikováno v: Advanced Science. 11/6/2024, Vol. 11 Issue 41, p1-23. 23p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Akademický článek

Real-world analysis of the incidence and risk factors of pneumonitis in non-small cell lung cancer patients treated with combined thoracic radiotherapy and immunotherapy.

Autor: Liu, Yuanxin¹ (AUTHOR), Yang, Xue¹ (AUTHOR), Wang, Yan¹ (AUTHOR), Zhou, Laiyan¹ (AUTHOR), Xue, Jianxin^1,2,3 (AUTHOR) jianxin-xue@wchscu.edu.cn

Publikováno v: Oncologie (De Gruyter). Nov2024, Vol. 26 Issue 6, p1003-1017. 15p.

Zobrazit plný text záznamu

Plný text ve formátu HTML

Report

COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models

Autor: Shen, Bowen, Lin, Zheng, Liu, Yuanxin, Liu, Zhengxiao, Wang, Lei, Wang, Weiping

Transformer-based pre-trained language models (PLMs) mostly suffer from excessive overhead despite their advanced capacity. For resource-constrained devices, there is an urgent need for a spatially and temporally efficient model which retains the maj

Externí odkaz: http://arxiv.org/abs/2210.15523

Zobrazit plný text záznamu

Report

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Autor: Si, Qingyi, Liu, Yuanxin, Lin, Zheng, Fu, Peng, Wang, Weiping

Despite the excellent performance of vision-language pre-trained models (VLPs) on conventional VQA task, they still suffer from two problems: First, VLPs tend to rely on language biases in datasets and fail to generalize to out-of-distribution (OOD)

Externí odkaz: http://arxiv.org/abs/2210.14558

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání