Zobrazeno 1 - 10
of 41
pro vyhledávání: '"Xiao, Shitao"'
Autor:
Li, Chaofan, Qin, MingHao, Xiao, Shitao, Chen, Jianlyu, Luo, Kun, Shao, Yingxia, Lian, Defu, Liu, Zheng
Large language models (LLMs) with decoder-only architectures demonstrate remarkable in-context learning (ICL) capabilities. This feature enables them to effectively handle both familiar and novel tasks by utilizing examples provided within their inpu
Externí odkaz:
http://arxiv.org/abs/2409.15700
The existing Retrieval-Augmented Generation (RAG) systems face significant challenges in terms of cost and effectiveness. On one hand, they need to encode the lengthy retrieved contexts before responding to the input tasks, which imposes substantial
Externí odkaz:
http://arxiv.org/abs/2409.15699
Autor:
Xiao, Shitao, Wang, Yueze, Zhou, Junjie, Yuan, Huaying, Xing, Xingrun, Yan, Ruiran, Wang, Shuting, Huang, Tiejun, Liu, Zheng
In this work, we introduce OmniGen, a new diffusion model for unified image generation. Unlike popular diffusion models (e.g., Stable Diffusion), OmniGen no longer requires additional modules such as ControlNet or IP-Adapter to process diverse contro
Externí odkaz:
http://arxiv.org/abs/2409.11340
Pretrained language models like BERT and T5 serve as crucial backbone encoders for dense retrieval. However, these models often exhibit limited generalization capabilities and face challenges in improving in domain accuracy. Recent research has explo
Externí odkaz:
http://arxiv.org/abs/2408.12194
Autor:
Xing, Xingrun, Gao, Boyan, Zhang, Zheng, Clifton, David A., Xiao, Shitao, Du, Li, Li, Guoqi, Zhang, Jiajun
The recent advancements in large language models (LLMs) with billions of parameters have significantly boosted their performance across various real-world applications. However, the inference processes for these models require substantial energy and
Externí odkaz:
http://arxiv.org/abs/2407.04752
Multi-modal retrieval becomes increasingly popular in practice. However, the existing retrievers are mostly text-oriented, which lack the capability to process visual information. Despite the presence of vision-language models like CLIP, the current
Externí odkaz:
http://arxiv.org/abs/2406.04292
Autor:
Zhou, Junjie, Shu, Yan, Zhao, Bo, Wu, Boya, Xiao, Shitao, Yang, Xi, Xiong, Yongping, Zhang, Bo, Huang, Tiejun, Liu, Zheng
The evaluation of Long Video Understanding (LVU) performance poses an important but challenging research problem. Despite previous efforts, the existing video understanding benchmarks are severely constrained by several issues, especially the insuffi
Externí odkaz:
http://arxiv.org/abs/2406.04264
Autor:
Xing, Xingrun, Zhang, Zheng, Ni, Ziyi, Xiao, Shitao, Ju, Yiming, Fan, Siqi, Wang, Yequan, Zhang, Jiajun, Li, Guoqi
Towards energy-efficient artificial intelligence similar to the human brain, the bio-inspired spiking neural networks (SNNs) have advantages of biological plausibility, event-driven sparsity, and binary activation. Recently, large-scale language mode
Externí odkaz:
http://arxiv.org/abs/2406.03287
Compressing lengthy context is a critical but technically challenging problem. In this paper, we propose a new method called UltraGist, which is distinguished for its high-quality compression of lengthy context due to the innovative design of the com
Externí odkaz:
http://arxiv.org/abs/2405.16635
Autor:
Zhang, Peitian, Shao, Ninglu, Liu, Zheng, Xiao, Shitao, Qian, Hongjin, Ye, Qiwei, Dou, Zhicheng
We extend the context length of Llama-3-8B-Instruct from 8K to 80K via QLoRA fine-tuning. The entire training cycle is super efficient, which takes 8 hours on one 8xA800 (80G) GPU machine. The resulted model exhibits superior performances across a br
Externí odkaz:
http://arxiv.org/abs/2404.19553