Zobrazeno 1 - 10
of 2 802
pro vyhledávání: '"Mei, Song"'
The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We
Externí odkaz:
http://arxiv.org/abs/2406.08654
Autor:
Mei, Song
U-Nets are among the most widely used architectures in computer vision, renowned for their exceptional performance in applications such as image segmentation, denoising, and diffusion modeling. However, a theoretical explanation of the U-Net architec
Externí odkaz:
http://arxiv.org/abs/2404.18444
An Overview of Diffusion Models: Applications, Guided Generation, Statistical Rates and Optimization
Diffusion models, a powerful and universal generative AI technology, have achieved tremendous success in computer vision, audio, reinforcement learning, and computational biology. In these applications, diffusion models provide flexible high-dimensio
Externí odkaz:
http://arxiv.org/abs/2404.07771
Large Language Models (LLMs) often memorize sensitive, private, or copyrighted data during pre-training. LLM unlearning aims to eliminate the influence of undesirable data from the pre-trained model while preserving the model's utilities on other tas
Externí odkaz:
http://arxiv.org/abs/2404.05868
Statistical Estimation in the Spiked Tensor Model via the Quantum Approximate Optimization Algorithm
The quantum approximate optimization algorithm (QAOA) is a general-purpose algorithm for combinatorial optimization. In this paper, we analyze the performance of the QAOA on a statistical estimation problem, namely, the spiked tensor model, which exh
Externí odkaz:
http://arxiv.org/abs/2402.19456
We study mean-field variational inference in a Bayesian linear model when the sample size n is comparable to the dimension p. In high dimensions, the common approach of minimizing a Kullback-Leibler divergence from the posterior distribution, or maxi
Externí odkaz:
http://arxiv.org/abs/2311.08442
While large language models based on the transformer architecture have demonstrated remarkable in-context learning (ICL) capabilities, understandings of such capabilities are still in an early stage, where existing theory and mechanistic understandin
Externí odkaz:
http://arxiv.org/abs/2310.10616
Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from uns
Externí odkaz:
http://arxiv.org/abs/2310.08566
Autor:
Mei, Song, Wu, Yuchen
We investigate the approximation efficiency of score functions by deep neural networks in diffusion-based generative modeling. While existing approximation theories utilize the smoothness of score functions, they suffer from the curse of dimensionali
Externí odkaz:
http://arxiv.org/abs/2309.11420
Inference for prediction errors is critical in time series forecasting pipelines. However, providing statistically meaningful uncertainty intervals for prediction errors remains relatively under-explored. Practitioners often resort to forward cross-v
Externí odkaz:
http://arxiv.org/abs/2309.07435