Zobrazeno 1 - 10
of 264
pro vyhledávání: '"Zhao, Tiancheng"'
Autor:
Zhao, Tiancheng, Zhang, Qianqian, Lee, Kyusong, Liu, Peng, Zhang, Lu, Fang, Chunxin, Liao, Jiajia, Jiang, Kelei, Ma, Yibo, Xu, Ruochen
We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision
Externí odkaz:
http://arxiv.org/abs/2407.04923
Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents
Externí odkaz:
http://arxiv.org/abs/2406.16620
Autor:
Zhang, Zilun, Sun, Yutao, Zhao, Tiancheng, Sha, Leigang, Xu, Ruochen, Lee, Kyusong, Yin, Jianwei
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Lan
Externí odkaz:
http://arxiv.org/abs/2406.11354
Autor:
Sun, Yinggang, Guo, Ziming, Yu, Haining, Liu, Chuanyi, Li, Xiang, Wang, Bingxuan, Yu, Xiangzhan, Zhao, Tiancheng
Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions
Externí odkaz:
http://arxiv.org/abs/2406.10593
Autor:
Sun, Yutao, Chen, Mingshuai, Zhao, Tiancheng, Zhao, Kangjia, Li, He, Chen, Jintao, Lu, Liqiang, Zhao, Xinkui, Deng, Shuiguang, Yin, Jianwei
Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show h
Externí odkaz:
http://arxiv.org/abs/2406.06600
End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements
Externí odkaz:
http://arxiv.org/abs/2403.06892
Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information
Externí odkaz:
http://arxiv.org/abs/2312.15043
Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the pr
Externí odkaz:
http://arxiv.org/abs/2310.13473
Autor:
Yao, Yiyang, Liu, Peng, Zhao, Tiancheng, Zhang, Qianqian, Liao, Jiajia, Fang, Chunxin, Lee, Kyusong, Wang, Qing
Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation metho
Externí odkaz:
http://arxiv.org/abs/2308.13177
Publikováno v:
TGRS-2023-06174
Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to mak
Externí odkaz:
http://arxiv.org/abs/2306.11300