Výsledky vyhledávání - "Liang, Yingyu"

Report

Multi-Layer Transformers Gradient Can be Approximated in Almost Linear Time

Autor: Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

The quadratic computational complexity in the self-attention mechanism of popular transformer architectures poses significant challenges for training and inference, particularly in terms of efficiency and memory requirements. Towards addressing these

Externí odkaz: http://arxiv.org/abs/2408.13233

Zobrazit plný text záznamu

Report

A Tighter Complexity Analysis of SparseGPT

Autor: Li, Xiaoyu, Liang, Yingyu, Shi, Zhenmei, Song, Zhao

In this work, we improved the analysis of the running time of SparseGPT [Frantar, Alistarh ICML 2023] from $O(d^{3})$ to $O(d^{\omega} + d^{2+a+o(1)} + d^{1+\omega(1,1,a)-a})$ for any $a \in [0, 1]$, where $\omega$ is the exponent of matrix multiplic

Externí odkaz: http://arxiv.org/abs/2408.12151

Zobrazit plný text záznamu

Report

Fast John Ellipsoid Computation with Differential Privacy Optimization

Autor: Gu, Jiuxiang, Li, Xiaoyu, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Yu, Junwei

Determining the John ellipsoid - the largest volume ellipsoid contained within a convex polytope - is a fundamental problem with applications in machine learning, optimization, and data analytics. Recent work has developed fast algorithms for approxi

Externí odkaz: http://arxiv.org/abs/2408.06395

Zobrazit plný text záznamu

Report

Do Large Language Models Have Compositional Ability? An Investigation into Limitations and Scalability

Autor: Xu, Zhuoyan, Shi, Zhenmei, Liang, Yingyu

Large language models (LLMs) have emerged as powerful tools for many AI problems and exhibit remarkable in-context learning (ICL) capabilities. Compositional ability, solving unseen complex tasks that combine two or more simple tasks, is an essential

Externí odkaz: http://arxiv.org/abs/2407.15720

Zobrazit plný text záznamu

Report

Differential Privacy of Cross-Attention with Provable Guarantee

Autor: Gu, Jiuxiang, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Cross-attention has become a fundamental module nowadays in many important artificial intelligence applications, e.g., retrieval-augmented generation (RAG), system prompt, guided stable diffusion, and many so on. Ensuring cross-attention privacy is c

Externí odkaz: http://arxiv.org/abs/2407.14717

Zobrazit plný text záznamu

Report

Differential Privacy Mechanisms in Neural Tangent Kernel Regression

Autor: Gu, Jiuxiang, Liang, Yingyu, Sha, Zhizhou, Shi, Zhenmei, Song, Zhao

Training data privacy is a fundamental problem in modern Artificial Intelligence (AI) applications, such as face recognition, recommendation systems, language generation, and many others, as it may contain sensitive user information related to legal

Externí odkaz: http://arxiv.org/abs/2407.13621

Zobrazit plný text záznamu

Report

Toward Infinite-Long Prefix in Transformer

Autor: Gu, Jiuxiang, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Yang, Chiwun

Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoret

Externí odkaz: http://arxiv.org/abs/2406.14036

Zobrazit plný text záznamu

Report

Why Larger Language Models Do In-context Learning Differently?

Autor: Shi, Zhenmei, Wei, Junyi, Xu, Zhuoyan, Liang, Yingyu

Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the m

Externí odkaz: http://arxiv.org/abs/2405.19592

Zobrazit plný text záznamu

Report

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Autor: Gu, Jiuxiang, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this pa

Externí odkaz: http://arxiv.org/abs/2405.16418

Zobrazit plný text záznamu

Report

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

Autor: Gu, Jiuxiang, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Zhou, Yufa

Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $\Omega(n^3)$ time complexity of tensor atte

Externí odkaz: http://arxiv.org/abs/2405.16411

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání