Zobrazeno 1 - 10
of 43
pro vyhledávání: '"Tao, Yunzhe"'
Autor:
Wang, Xuwu, Cui, Qiwen, Tao, Yunzhe, Wang, Yiran, Chai, Ziwei, Han, Xiaotian, Liu, Boyi, Yuan, Jianbo, Su, Jing, Wang, Guoyin, Liu, Tingkai, Chen, Liyu, Liu, Tianyi, Sun, Tao, Zhang, Yufeng, Zheng, Sirui, You, Quanzeng, Yang, Yang, Yang, Hongxia
Large language models (LLMs) have become increasingly pivotal across various domains, especially in handling complex data types. This includes structured data processing, as exemplified by ChartQA and ChatGPT-Ada, and multimodal unstructured data pro
Externí odkaz:
http://arxiv.org/abs/2410.00773
Autor:
Fan, Qihang, You, Quanzeng, Han, Xiaotian, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia
This paper tackles a significant challenge faced by Vision Transformers (ViTs): their constrained scalability across different image resolutions. Typically, ViTs experience a performance decline when processing resolutions different from those seen d
Externí odkaz:
http://arxiv.org/abs/2403.18361
Autor:
Zhang, Yufeng, Chen, Liyu, Liu, Boyi, Yang, Yingxiang, Cui, Qiwen, Tao, Yunzhe, Yang, Hongxia
Recent advances in reinforcement learning (RL) algorithms aim to enhance the performance of language models at scale. Yet, there is a noticeable absence of a cost-effective and standardized testbed tailored to evaluating and comparing these algorithm
Externí odkaz:
http://arxiv.org/abs/2403.07191
Autor:
Liu, Haogeng, You, Quanzeng, Han, Xiaotian, Wang, Yiqi, Zhai, Bohan, Liu, Yongfei, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia
Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in the accurate recognition and comprehension of intricate details within high-resolution images. Despite being indispensabl
Externí odkaz:
http://arxiv.org/abs/2403.01487
Autor:
Liu, Tingkai, Tao, Yunzhe, Liu, Haogeng, Fan, Qihang, Zhou, Ding, Huang, Huaibo, He, Ran, Yang, Hongxia
We present a novel human annotated dataset for evaluating the ability for visual-language models to generate both short and long descriptions for real-world video clips, termed DeVAn (Dense Video Annotation). The dataset contains 8.5K YouTube video c
Externí odkaz:
http://arxiv.org/abs/2310.05060
Autor:
Liu, Haogeng, Fan, Qihang, Liu, Tingkai, Yang, Linjie, Tao, Yunzhe, Huang, Huaibo, He, Ran, Yang, Hongxia
This paper proposes Video-Teller, a video-language foundation model that leverages multi-modal fusion and fine-grained modality alignment to significantly enhance the video-to-text generation task. Video-Teller boosts the training efficiency by utili
Externí odkaz:
http://arxiv.org/abs/2310.04991
In this paper, we introduce $\text{EVL}_{\text{Gen}}$, a streamlined framework designed for the pre-training of visually conditioned language generation models with high computational demands, utilizing frozen pre-trained large language models (LLMs)
Externí odkaz:
http://arxiv.org/abs/2310.03291
Program synthesis aims to create accurate, executable programs from problem specifications, specifically from natural language descriptions in our context. Recent studies have leveraged the power of reinforcement learning (RL) in conjunction with lar
Externí odkaz:
http://arxiv.org/abs/2310.03173
Autor:
Yang, Liangwei, Wang, Shengjie, Tao, Yunzhe, Sun, Jiankai, Liu, Xiaolong, Yu, Philip S., Wang, Taiqing
Graph Neural Network (GNN) based recommender systems have been attracting more and more attention in recent years due to their excellent performance in accuracy. Representing user-item interactions as a bipartite graph, a GNN model generates user and
Externí odkaz:
http://arxiv.org/abs/2211.10486
Most existing neural network based task-oriented dialogue systems follow encoder-decoder paradigm, where the decoder purely depends on the source texts to generate a sequence of words, usually suffering from instability and poor readability. Inspired
Externí odkaz:
http://arxiv.org/abs/2106.05830