Zobrazeno 1 - 10
of 679
pro vyhledávání: '"Hu, Jerry"'
We investigate the approximation and estimation rates of conditional diffusion transformers (DiTs) with classifier-free guidance. We present a comprehensive analysis for ``in-context'' conditional DiTs under four common data assumptions. We show that
Externí odkaz:
http://arxiv.org/abs/2411.17522
We investigate the transformer's capability for in-context learning (ICL) to simulate the training process of deep models. Our key contribution is providing a positive example of using a transformer to train a deep neural network by gradient descent
Externí odkaz:
http://arxiv.org/abs/2411.16549
We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \textit{single-head} transformers with only a \textit{single} self-attention layer: (i) is un
Externí odkaz:
http://arxiv.org/abs/2411.16525
Given a database of bit strings $A_1,\ldots,A_m\in \{0,1\}^n$, a fundamental data structure task is to estimate the distances between a given query $B\in \{0,1\}^n$ with all the strings in the database. In addition, one might further want to ensure t
Externí odkaz:
http://arxiv.org/abs/2411.05750
We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory conf
Externí odkaz:
http://arxiv.org/abs/2410.23126
We introduce a refined differentially private (DP) data structure for kernel density estimation (KDE), offering not only improved privacy-utility tradeoff but also better efficiency over prior results. Specifically, we study the mathematical problem:
Externí odkaz:
http://arxiv.org/abs/2409.01688
We investigate the statistical and computational limits of latent Diffusion Transformers (DiTs) under the low-dimensional linear latent space assumption. Statistically, we study the universal approximation and sample complexity of the DiTs score func
Externí odkaz:
http://arxiv.org/abs/2407.01079
We study the computational limits of Low-Rank Adaptation (LoRA) update for finetuning transformer-based models using fine-grained complexity theory. Our key observation is that the existence of low-rank decompositions within the gradient computation
Externí odkaz:
http://arxiv.org/abs/2406.03136
Autor:
Luo, Haozheng, Yu, Jiahao, Zhang, Wenxin, Li, Jialong, Hu, Jerry Yao-Chieh, Xing, Xinyu, Liu, Han
We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation
Externí odkaz:
http://arxiv.org/abs/2406.01514
Along with the remarkable successes of Language language models, recent research also started to explore the security threats of LLMs, including jailbreaking attacks. Attackers carefully craft jailbreaking prompts such that a target LLM will respond
Externí odkaz:
http://arxiv.org/abs/2405.20653