Zobrazeno 1 - 10
of 52
pro vyhledávání: '"Lee, Kyusong"'
Autor:
Zhao, Tiancheng, Zhang, Qianqian, Lee, Kyusong, Liu, Peng, Zhang, Lu, Fang, Chunxin, Liao, Jiajia, Jiang, Kelei, Ma, Yibo, Xu, Ruochen
We introduce OmChat, a model designed to excel in handling long contexts and video understanding tasks. OmChat's new architecture standardizes how different visual inputs are processed, making it more efficient and adaptable. It uses a dynamic vision
Externí odkaz:
http://arxiv.org/abs/2407.04923
Recent advancements in Large Language Models (LLMs) have expanded their capabilities to multimodal contexts, including comprehensive video understanding. However, processing extensive videos such as 24-hour CCTV footage or full-length films presents
Externí odkaz:
http://arxiv.org/abs/2406.16620
Autor:
Zhang, Zilun, Sun, Yutao, Zhao, Tiancheng, Sha, Leigang, Xu, Ruochen, Lee, Kyusong, Yin, Jianwei
Humans can retain old knowledge while learning new information, but Large Language Models (LLMs) often suffer from catastrophic forgetting when post-pretrained or supervised fine-tuned (SFT) on domain-specific data. Moreover, for Multimodal Large Lan
Externí odkaz:
http://arxiv.org/abs/2406.11354
End-to-end transformer-based detectors (DETRs) have shown exceptional performance in both closed-set and open-vocabulary object detection (OVD) tasks through the integration of language modalities. However, their demanding computational requirements
Externí odkaz:
http://arxiv.org/abs/2403.06892
Autor:
Yao, Yiyang, Liu, Peng, Zhao, Tiancheng, Zhang, Qianqian, Liao, Jiajia, Fang, Chunxin, Lee, Kyusong, Wang, Qing
Object detection (OD) in computer vision has made significant progress in recent years, transitioning from closed-set labels to open-vocabulary detection (OVD) based on large-scale vision-language pre-training (VLP). However, current evaluation metho
Externí odkaz:
http://arxiv.org/abs/2308.13177
The advancement of object detection (OD) in open-vocabulary and open-world scenarios is a critical challenge in computer vision. This work introduces OmDet, a novel language-aware object detection architecture, and an innovative training mechanism th
Externí odkaz:
http://arxiv.org/abs/2209.05946
Autor:
Zhao, Tiancheng, Zhang, Tianqi, Zhu, Mingwei, Shen, Haozhan, Lee, Kyusong, Lu, Xiaopeng, Yin, Jianwei
Vision-Language Pretraining (VLP) models have recently successfully facilitated many cross-modal downstream tasks. Most existing works evaluated their systems by comparing the fine-tuned downstream task performance. However, only average downstream t
Externí odkaz:
http://arxiv.org/abs/2207.00221
Conversational Artificial Intelligence (AI) used in industry settings can be trained to closely mimic human behaviors, including lying and deception. However, lying is often a necessary part of negotiation. To address this, we develop a normative fra
Externí odkaz:
http://arxiv.org/abs/2103.05434
Although open-domain question answering (QA) draws great attention in recent years, it requires large amounts of resources for building the full system and is often difficult to reproduce previous results due to complex configurations. In this paper,
Externí odkaz:
http://arxiv.org/abs/2101.01910
Text-to-image retrieval is an essential task in cross-modal information retrieval, i.e., retrieving relevant images from a large and unlabelled dataset given textual queries. In this paper, we propose VisualSparta, a novel (Visual-text Sparse Transfo
Externí odkaz:
http://arxiv.org/abs/2101.00265