Zobrazeno 1 - 10
of 167
pro vyhledávání: '"Li, Mufan"'
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or equivalently, the complexity of minimizing a functional over the space of probability measures which includes an interaction term. Our main insight is to dec
Externí odkaz:
http://arxiv.org/abs/2402.07355
Autor:
Li, Mufan Bill, Nica, Mihai
Recent analyses of neural networks with shaped activations (i.e. the activation function is scaled as the network size grows) have led to scaling limits described by differential equations. However, these results do not a priori tell us anything abou
Externí odkaz:
http://arxiv.org/abs/2310.12079
The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperpa
Externí odkaz:
http://arxiv.org/abs/2309.16620
Autor:
Noci, Lorenzo, Li, Chuning, Li, Mufan Bill, He, Bobby, Hofmann, Thomas, Maddison, Chris, Roy, Daniel M.
In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with s
Externí odkaz:
http://arxiv.org/abs/2306.17759
Autor:
Zhang, Matthew, Chewi, Sinho, Li, Mufan Bill, Balasubramanian, Krishnakumar, Erdogdu, Murat A.
Underdamped Langevin Monte Carlo (ULMC) is an algorithm used to sample from unnormalized densities by leveraging the momentum of a particle moving in a potential well. We provide a novel analysis of ULMC, motivated by two central questions: (1) Can w
Externí odkaz:
http://arxiv.org/abs/2302.08049
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian, given a random covariance matrix defined by the penultimate layer. In this work, we study the distribution of this random matrix. Recent work has shown th
Externí odkaz:
http://arxiv.org/abs/2206.02768
Autor:
Berthier, Raphaël, Li, Mufan
Gossip algorithms and their accelerated versions have been studied exclusively in discrete time on graphs. In this work, we take a different approach, and consider the scaling limit of gossip algorithms in both large graphs and large number of iterat
Externí odkaz:
http://arxiv.org/abs/2202.10742
Classically, the continuous-time Langevin diffusion converges exponentially fast to its stationary distribution $\pi$ under the sole assumption that $\pi$ satisfies a Poincar\'e inequality. Using this fact to provide guarantees for the discrete-time
Externí odkaz:
http://arxiv.org/abs/2112.12662
Publikováno v:
In AJIC: American Journal of Infection Control May 2024 52(5):533-540
Theoretical results show that neural networks can be approximated by Gaussian processes in the infinite-width limit. However, for fully connected networks, it has been previously shown that for any fixed network width, $n$, the Gaussian approximation
Externí odkaz:
http://arxiv.org/abs/2106.04013