Zobrazeno 1 - 10
of 25
pro vyhledávání: '"Fei, Nanyi"'
Visual instruction tuning is a key training stage of large multimodal models (LMMs). Nevertheless, the common practice of indiscriminately mixing instruction-following data from various tasks may result in suboptimal overall performance due to differ
Externí odkaz:
http://arxiv.org/abs/2403.04343
In multi-task learning (MTL), gradient balancing has recently attracted more research interest than loss balancing since it often leads to better performance. However, loss balancing is much more efficient than gradient balancing, and thus it is stil
Externí odkaz:
http://arxiv.org/abs/2307.15429
This work introduces Video Diffusion Transformer (VDT), which pioneers the use of transformers in diffusion-based video generation. It features transformer blocks with modularized temporal and spatial attention modules to leverage the rich spatial-te
Externí odkaz:
http://arxiv.org/abs/2305.13311
Video-language modeling has attracted much attention with the rapid growth of web videos. Most existing methods assume that the video frames and text description are semantically correlated, and focus on video-language modeling at video level. Howeve
Externí odkaz:
http://arxiv.org/abs/2209.11388
Autor:
Lu, Haoyu, Zhou, Qiongyi, Fei, Nanyi, Lu, Zhiwu, Ding, Mingyu, Wen, Jingyuan, Du, Changde, Zhao, Xin, Sun, Hao, He, Huiguang, Wen, Ji-Rong
Multimodal learning, especially large-scale multimodal pre-training, has developed rapidly over the past few years and led to the greatest advances in artificial intelligence (AI). Despite its effectiveness, understanding the underlying mechanism of
Externí odkaz:
http://arxiv.org/abs/2208.08263
Large-scale single-stream pre-training has shown dramatic performance in image-text retrieval. Regrettably, it faces low inference efficiency due to heavy attention layers. Recently, two-stream methods like CLIP and ALIGN with high inference efficien
Externí odkaz:
http://arxiv.org/abs/2204.07441
Autor:
Yuan, Sha, Zhao, Hanyu, Zhao, Shuai, Leng, Jiahong, Liang, Yangxiao, Wang, Xiaozhi, Yu, Jifan, Lv, Xin, Shao, Zhou, He, Jiaao, Lin, Yankai, Han, Xu, Liu, Zhenghao, Ding, Ning, Rao, Yongming, Gao, Yizhao, Zhang, Liang, Ding, Ming, Fang, Cong, Wang, Yisen, Long, Mingsheng, Zhang, Jing, Dong, Yinpeng, Pang, Tianyu, Cui, Peng, Huang, Lingxiao, Liang, Zheng, Shen, Huawei, Zhang, Hui, Zhang, Quanshi, Dong, Qingxiu, Tan, Zhixing, Wang, Mingxuan, Wang, Shuo, Zhou, Long, Li, Haoran, Bao, Junwei, Pan, Yingwei, Zhang, Weinan, Yu, Zhou, Yan, Rui, Shi, Chence, Xu, Minghao, Zhang, Zuobai, Wang, Guoqiang, Pan, Xiang, Li, Mengjie, Chu, Xiaoyu, Yao, Zijun, Zhu, Fangwei, Cao, Shulin, Xue, Weicheng, Ma, Zixuan, Zhang, Zhengyan, Hu, Shengding, Qin, Yujia, Xiao, Chaojun, Zeng, Zheni, Cui, Ganqu, Chen, Weize, Zhao, Weilin, Yao, Yuan, Li, Peng, Zheng, Wenzhao, Zhao, Wenliang, Wang, Ziyi, Zhang, Borui, Fei, Nanyi, Hu, Anwen, Ling, Zenan, Li, Haoyang, Cao, Boxi, Han, Xianpei, Zhan, Weidong, Chang, Baobao, Sun, Hao, Deng, Jiawen, Zheng, Chujie, Li, Juanzi, Hou, Lei, Cao, Xigang, Zhai, Jidong, Liu, Zhiyuan, Sun, Maosong, Lu, Jiwen, Lu, Zhiwu, Jin, Qin, Song, Ruihua, Wen, Ji-Rong, Lin, Zhouchen, Wang, Liwei, Su, Hang, Zhu, Jun, Sui, Zhifang, Zhang, Jiajun, Liu, Yang, He, Xiaodong, Huang, Minlie, Tang, Jian, Tang, Jie
With the rapid development of deep learning, training Big Models (BMs) for multiple downstream tasks becomes a popular paradigm. Researchers have achieved various outcomes in the construction of BMs and the BM application in many fields. At present,
Externí odkaz:
http://arxiv.org/abs/2203.14101
Autor:
Fei, Nanyi, Lu, Zhiwu, Gao, Yizhao, Yang, Guoxing, Huo, Yuqi, Wen, Jingyuan, Lu, Haoyu, Song, Ruihua, Gao, Xin, Xiang, Tao, Sun, Hao, Wen, Ji-Rong
The fundamental goal of artificial intelligence (AI) is to mimic the core cognitive activities of human. Despite tremendous success in the AI research, most of existing methods have only single-cognitive ability. To overcome this limitation and take
Externí odkaz:
http://arxiv.org/abs/2110.14378
Most recent few-shot learning (FSL) methods are based on meta-learning with episodic training. In each meta-training episode, a discriminative feature embedding and/or classifier are first constructed from a support set in an inner loop, and then eva
Externí odkaz:
http://arxiv.org/abs/2101.09499
Existing meta-learning based few-shot learning (FSL) methods typically adopt an episodic training strategy whereby each episode contains a meta-task. Across episodes, these tasks are sampled randomly and their relationships are ignored. In this paper
Externí odkaz:
http://arxiv.org/abs/2002.04274