Výsledky vyhledávání

Report

CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning

Autor: Dai, Yanqi, Jing, Dong, Fei, Nanyi, Lu, Zhiwu

Visual instruction tuning is a key training stage of large multimodal models (LMMs). Nevertheless, the common practice of indiscriminately mixing instruction-following data from various tasks may result in suboptimal overall performance due to differ

Externí odkaz: http://arxiv.org/abs/2403.04343

Zobrazit plný text záznamu

Report

Improvable Gap Balancing for Multi-Task Learning

Autor: Dai, Yanqi, Fei, Nanyi, Lu, Zhiwu

In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is stil

Externí odkaz: http://arxiv.org/abs/2307.15429

Zobrazit plný text záznamu

Report

VDT: General-purpose Video Diffusion Transformers via Mask Modeling

Autor: Lu, Haoyu, Yang, Guoxing, Fei, Nanyi, Huo, Yuqi, Lu, Zhiwu, Luo, Ping, Ding, Mingyu

This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers in diffusion-based video generation. It features transformer blocks with modularized temporal and spatial attention modules to leverage the rich spatial-te

Externí odkaz: http://arxiv.org/abs/2305.13311

Zobrazit plný text záznamu

Report

LGDN: Language-Guided Denoising Network for Video-Language Modeling

Autor: Lu, Haoyu, Ding, Mingyu, Fei, Nanyi, Huo, Yuqi, Lu, Zhiwu

Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. Howeve

Externí odkaz: http://arxiv.org/abs/2209.11388

Zobrazit plný text záznamu

Report

Multimodal foundation models are better simulators of the human brain

Autor: Lu, Haoyu, Zhou, Qiongyi, Fei, Nanyi, Lu, Zhiwu, Ding, Mingyu, Wen, Jingyuan, Du, Changde, Zhao, Xin, Sun, Hao, He, Huiguang, Wen, Ji-Rong

Multimodal learning, especially large-scale multimodal pre-training, has developed rapidly over the past few years and led to the greatest advances in artificial intelligence (AI). Despite its effectiveness, understanding the underlying mechanism of

Externí odkaz: http://arxiv.org/abs/2208.08263

Zobrazit plný text záznamu

Report

COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

Autor: Lu, Haoyu, Fei, Nanyi, Huo, Yuqi, Gao, Yizhao, Lu, Zhiwu, Wen, Ji-Rong

Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high inference efficien

Externí odkaz: http://arxiv.org/abs/2204.07441

Zobrazit plný text záznamu

Report

A Roadmap for Big Model

Autor: Yuan, Sha, Zhao, Hanyu, Zhao, Shuai, Leng, Jiahong, Liang, Yangxiao, Wang, Xiaozhi, Yu, Jifan, Lv, Xin, Shao, Zhou, He, Jiaao, Lin, Yankai, Han, Xu, Liu, Zhenghao, Ding, Ning, Rao, Yongming, Gao, Yizhao, Zhang, Liang, Ding, Ming, Fang, Cong, Wang, Yisen, Long, Mingsheng, Zhang, Jing, Dong, Yinpeng, Pang, Tianyu, Cui, Peng, Huang, Lingxiao, Liang, Zheng, Shen, Huawei, Zhang, Hui, Zhang, Quanshi, Dong, Qingxiu, Tan, Zhixing, Wang, Mingxuan, Wang, Shuo, Zhou, Long, Li, Haoran, Bao, Junwei, Pan, Yingwei, Zhang, Weinan, Yu, Zhou, Yan, Rui, Shi, Chence, Xu, Minghao, Zhang, Zuobai, Wang, Guoqiang, Pan, Xiang, Li, Mengjie, Chu, Xiaoyu, Yao, Zijun, Zhu, Fangwei, Cao, Shulin, Xue, Weicheng, Ma, Zixuan, Zhang, Zhengyan, Hu, Shengding, Qin, Yujia, Xiao, Chaojun, Zeng, Zheni, Cui, Ganqu, Chen, Weize, Zhao, Weilin, Yao, Yuan, Li, Peng, Zheng, Wenzhao, Zhao, Wenliang, Wang, Ziyi, Zhang, Borui, Fei, Nanyi, Hu, Anwen, Ling, Zenan, Li, Haoyang, Cao, Boxi, Han, Xianpei, Zhan, Weidong, Chang, Baobao, Sun, Hao, Deng, Jiawen, Zheng, Chujie, Li, Juanzi, Hou, Lei, Cao, Xigang, Zhai, Jidong, Liu, Zhiyuan, Sun, Maosong, Lu, Jiwen, Lu, Zhiwu, Jin, Qin, Song, Ruihua, Wen, Ji-Rong, Lin, Zhouchen, Wang, Liwei, Su, Hang, Zhu, Jun, Sui, Zhifang, Zhang, Jiajun, Liu, Yang, He, Xiaodong, Huang, Minlie, Tang, Jian, Tang, Jie

With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present,

Externí odkaz: http://arxiv.org/abs/2203.14101

Zobrazit plný text záznamu

Report

Towards artificial general intelligence via a multimodal foundation model

Autor: Fei, Nanyi, Lu, Zhiwu, Gao, Yizhao, Yang, Guoxing, Huo, Yuqi, Wen, Jingyuan, Lu, Haoyu, Song, Ruihua, Gao, Xin, Xiang, Tao, Sun, Hao, Wen, Ji-Rong

The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take

Externí odkaz: http://arxiv.org/abs/2110.14378

Zobrazit plný text záznamu

Report

Contrastive Prototype Learning with Augmented Embeddings for Few-Shot Learning

Autor: Gao, Yizhao, Fei, Nanyi, Liu, Guangzhen, Lu, Zhiwu, Xiang, Tao, Huang, Songfang

Most recent few-shot learning (FSL) methods are based on meta-learning with episodic training. In each meta-training episode, a discriminative feature embedding and/or classifier are first constructed from a support set in an inner loop, and then eva

Externí odkaz: http://arxiv.org/abs/2101.09499

Zobrazit plný text záznamu

Report

Meta-Learning across Meta-Tasks for Few-Shot Learning

Autor: Fei, Nanyi, Lu, Zhiwu, Gao, Yizhao, Tian, Jia, Xiang, Tao, Wen, Ji-Rong

Existing meta-learning based few-shot learning (FSL) methods typically adopt an episodic training strategy whereby each episode contains a meta-task. Across episodes, these tasks are sampled randomly and their relationships are ignored. In this paper

Externí odkaz: http://arxiv.org/abs/2002.04274

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání