Výsledky vyhledávání - "Zhao, Tiancheng"

Report

OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding

Autor: Zhao, Tiancheng, Zhang, Qianqian, Lee, Kyusong, Liu, Peng, Zhang, Lu, Fang, Chunxin, Liao, Jiajia, Jiang, Kelei, Ma, Yibo, Xu, Ruochen

We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision

Externí odkaz: http://arxiv.org/abs/2407.04923

Zobrazit plný text záznamu

Report

OmAgent: A Multi-modal Agent Framework for Complex Video Understanding with Task Divide-and-Conquer

Autor: Zhang, Lu, Zhao, Tiancheng, Ying, Heting, Ma, Yibo, Lee, Kyusong

Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents

Externí odkaz: http://arxiv.org/abs/2406.16620

Zobrazit plný text záznamu

Report

Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression

Autor: Zhang, Zilun, Sun, Yutao, Zhao, Tiancheng, Sha, Leigang, Xu, Ruochen, Lee, Kyusong, Yin, Jianwei

Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Lan

Externí odkaz: http://arxiv.org/abs/2406.11354

Zobrazit plný text záznamu

Report

QDA-SQL: Questions Enhanced Dialogue Augmentation for Multi-Turn Text-to-SQL

Autor: Sun, Yinggang, Guo, Ziming, Yu, Haining, Liu, Chuanyi, Li, Xiang, Wang, Bingxuan, Yu, Xiangzhan, Zhao, Tiancheng

Fine-tuning large language models (LLMs) for specific domain tasks has achieved great success in Text-to-SQL tasks. However, these fine-tuned models often face challenges with multi-turn Text-to-SQL tasks caused by ambiguous or unanswerable questions

Externí odkaz: http://arxiv.org/abs/2406.10593

Zobrazit plný text záznamu

Report

HORAE: A Domain-Agnostic Modeling Language for Automating Multimodal Service Regulation

Autor: Sun, Yutao, Chen, Mingshuai, Zhao, Tiancheng, Zhao, Kangjia, Li, He, Chen, Jintao, Lu, Liqiang, Zhao, Xinkui, Deng, Shuiguang, Yin, Jianwei

Artificial intelligence is rapidly encroaching on the field of service regulation. This work presents the design principles behind HORAE, a unified specification language to model multimodal regulation rules across a diverse set of domains. We show h

Externí odkaz: http://arxiv.org/abs/2406.06600

Zobrazit plný text záznamu

Report

Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head

Autor: Zhao, Tiancheng, Liu, Peng, He, Xuan, Zhang, Lu, Lee, Kyusong

End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements

Externí odkaz: http://arxiv.org/abs/2403.06892

Zobrazit plný text záznamu

Report

GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection

Autor: Shen, Haozhan, Zhao, Tiancheng, Zhu, Mingwei, Yin, Jianwei

Visual grounding, a crucial vision-language task involving the understanding of the visual context based on the query expression, necessitates the model to capture the interactions between objects, as well as various spatial and attribute information

Externí odkaz: http://arxiv.org/abs/2312.15043

Zobrazit plný text záznamu

Report

Benchmarking Sequential Visual Input Reasoning and Prediction in Multimodal Large Language Models

Autor: Zhu, Mingwei, Sha, Leigang, Shu, Yu, Zhao, Kangjia, Zhao, Tiancheng, Yin, Jianwei

Multimodal large language models (MLLMs) have shown great potential in perception and interpretation tasks, but their capabilities in predictive reasoning remain under-explored. To address this gap, we introduce a novel benchmark that assesses the pr

Externí odkaz: http://arxiv.org/abs/2310.13473

Zobrazit plný text záznamu

Report

How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection

Autor: Yao, Yiyang, Liu, Peng, Zhao, Tiancheng, Zhang, Qianqian, Liao, Jiajia, Fang, Chunxin, Lee, Kyusong, Wang, Qing

Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation metho

Externí odkaz: http://arxiv.org/abs/2308.13177

Zobrazit plný text záznamu

Report

RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing

Autor: Zhang, Zilun, Zhao, Tiancheng, Guo, Yulong, Yin, Jianwei

Publikováno v: TGRS-2023-06174

Pre-trained Vision-Language Models (VLMs) utilizing extensive image-text paired data have demonstrated unprecedented image-text association capabilities, achieving remarkable results across various downstream tasks. A critical challenge is how to mak

Externí odkaz: http://arxiv.org/abs/2306.11300

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání