Výsledky vyhledávání

Report

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Autor: Liang, Yingyu, Long, Jiangxuan, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model sizes, making

Externí odkaz: http://arxiv.org/abs/2410.11261

Zobrazit plný text záznamu

Report

Fine-grained Attention I/O Complexity: Comprehensive Analysis for Backward Passes

Autor: Li, Xiaoyu, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Large Language Models (LLMs) have demonstrated remarkable capabilities in processing long-context information. However, the quadratic complexity of attention computation with respect to sequence length poses significant computational challenges, and

Externí odkaz: http://arxiv.org/abs/2410.09397

Zobrazit plný text záznamu

Report

Looped ReLU MLPs May Be All You Need as Practical Programmable Computers

Autor: Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Previous work has demonstrated that attention mechanisms are Turing complete. More recently, it has been shown that a looped 13-layer Transformer can function as a universal programmable computer. In contrast, the multi-layer perceptrons with $\maths

Externí odkaz: http://arxiv.org/abs/2410.09375

Zobrazit plný text záznamu

Report

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Autor: Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

The computational complexity of the self-attention mechanism in popular transformer architectures poses significant challenges for training and inference, and becomes the bottleneck for long inputs. Is it possible to significantly reduce the quadrati

Externí odkaz: http://arxiv.org/abs/2408.13233

Zobrazit plný text záznamu

Report

Differential Privacy of Cross-Attention with Provable Guarantee

Autor: Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many more. Ensuring cross-attention privacy is cr

Externí odkaz: http://arxiv.org/abs/2407.14717

Zobrazit plný text záznamu

Report

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Autor: Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this pa

Externí odkaz: http://arxiv.org/abs/2405.16418

Zobrazit plný text záznamu

Report

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

Autor: Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $O(n^3)$ time complexity of tensor attention

Externí odkaz: http://arxiv.org/abs/2405.16411

Zobrazit plný text záznamu

Report

Differentially Private Attention Computation

Autor: Gao, Yeqi, Song, Zhao, Yang, Xin, Zhou, Yufa

Large language models (LLMs), especially those based on the Transformer architecture, have had a profound impact on various aspects of daily life, such as natural language processing, content generation, research methodologies, and more. Nevertheless

Externí odkaz: http://arxiv.org/abs/2305.04701

Zobrazit plný text záznamu

Akademický článek

Work Patterns and Intensity of Chinese Surgical Residents- A Multicenter Time-and-Motion Study

Autor: Liu, Yong ^*, Tan, Jie, Ngwayi, James Reeves Mbori, Zhuang, Xiaolin, Ding, Zhaohan, Chen, Yujie, Zhou, Yufa, Porter, Daniel Edward

Publikováno v: In Journal of Surgical Education January 2024 81(1):76-83

Zobrazit plný text záznamu

Akademický článek

Tento výsledek nelze pro nepřihlášené uživatele zobrazit.
K zobrazení výsledku je třeba se přihlásit.

Vyhledávací nástroje:

Upřesnit hledání