Zobrazeno 1 - 10
of 749
pro vyhledávání: '"Xuanzhe AN"'
Audio data, often synchronized with video frames, plays a crucial role in guiding the audience's visual attention. Incorporating audio information into video saliency prediction tasks can enhance the prediction of human visual behavior. However, exis
Externí odkaz:
http://arxiv.org/abs/2411.11454
Autor:
Li, Xinyue, Chen, Zhenpeng, Zhang, Jie M., Lou, Yiling, Li, Tianlin, Sun, Weisong, Liu, Yang, Liu, Xuanzhe
Large Language Models (LLMs) have become foundational in modern language-driven applications, profoundly influencing daily life. A critical technique in leveraging their potential is role-playing, where LLMs simulate diverse roles to enhance their re
Externí odkaz:
http://arxiv.org/abs/2411.00585
On-device Large Language Models (LLMs) are revolutionizing mobile AI, enabling applications such as UI automation while addressing privacy concerns. Currently, the standard approach involves deploying a single, robust LLM as a universal solution for
Externí odkaz:
http://arxiv.org/abs/2409.09071
On-device inference for Large Language Models (LLMs), driven by increasing privacy concerns and advancements of mobile-sized models, has gained significant interest. However, even mobile-sized LLMs (e.g., Gemma-2B) encounter unacceptably high inferen
Externí odkaz:
http://arxiv.org/abs/2407.05858
Autor:
Gu, Diandian, Sun, Peng, Hu, Qinghao, Huang, Ting, Chen, Xun, Xiong, Yingtong, Wang, Guoteng, Chen, Qiaoling, Zhao, Shangchun, Fang, Jiarui, Wen, Yonggang, Zhang, Tianwei, Jin, Xin, Liu, Xuanzhe
Efficiently training LLMs with long sequences is important yet challenged by the massive computation and memory requirements. Sequence parallelism has been proposed to tackle these problems, but existing methods suffer from scalability or efficiency
Externí odkaz:
http://arxiv.org/abs/2406.18485
WebAssembly (abbreviated as Wasm) was initially introduced for the Web but quickly extended its reach into various domains beyond the Web. To create Wasm applications, developers can compile high-level programming languages into Wasm binaries or manu
Externí odkaz:
http://arxiv.org/abs/2404.12621
Retrieval-Augmented Generation (RAG) has shown significant improvements in various natural language processing tasks by integrating the strengths of large language models (LLMs) and external knowledge databases. However, RAG introduces long sequence
Externí odkaz:
http://arxiv.org/abs/2404.12457
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between different requests as well as between different phases of the same request. Restricted by static parallelism strategies, ex
Externí odkaz:
http://arxiv.org/abs/2404.09526
Being more powerful and intrusive into user-device interactions, LLMs are eager for on-device execution to better preserve user privacy. In this work, we propose a new paradigm of mobile AI: LLM as a system service on mobile devices (LLMaaS). Unlike
Externí odkaz:
http://arxiv.org/abs/2403.11805
Autor:
Wang, Qipeng, Jiang, Shiqi, Chen, Zhenpeng, Cao, Xu, Li, Yuanchun, Li, Aoyu, Ma, Yun, Cao, Ting, Liu, Xuanzhe
Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE)
Externí odkaz:
http://arxiv.org/abs/2402.05981