Zobrazeno 1 - 10
of 1 011
pro vyhledávání: '"A Nichani"'
Large language models have demonstrated an impressive ability to perform factual recall. Prior work has found that transformers trained on factual recall tasks can store information at a rate proportional to their parameter count. In our work, we sho
Externí odkaz:
http://arxiv.org/abs/2412.06538
In deep learning theory, a critical question is to understand how neural networks learn hierarchical features. In this work, we study the learning of hierarchical polynomials of \textit{multiple nonlinear features} using three-layer neural networks.
Externí odkaz:
http://arxiv.org/abs/2411.17201
The incredible success of transformers on sequence modeling tasks can be largely attributed to the self-attention mechanism, which allows information to be transferred between different parts of a sequence. Self-attention allows transformers to encod
Externí odkaz:
http://arxiv.org/abs/2402.14735
We study the problem of learning hierarchical polynomials over the standard Gaussian distribution with three-layer neural networks. We specifically consider target functions of the form $h = g \circ p$ where $p : \mathbb{R}^d \rightarrow \mathbb{R}$
Externí odkaz:
http://arxiv.org/abs/2311.13774
Autor:
Malladi, Sadhika, Gao, Tianyu, Nichani, Eshaan, Damian, Alex, Lee, Jason D., Chen, Danqi, Arora, Sanjeev
Fine-tuning language models (LMs) has yielded success on diverse downstream tasks, but as LMs grow in size, backpropagation requires a prohibitively large amount of memory. Zeroth-order (ZO) methods can in principle estimate gradients using only two
Externí odkaz:
http://arxiv.org/abs/2305.17333
We focus on the task of learning a single index model $\sigma(w^\star \cdot x)$ with respect to the isotropic Gaussian distribution in $d$ dimensions. Prior work has shown that the sample complexity of learning $w^\star$ is governed by the informatio
Externí odkaz:
http://arxiv.org/abs/2305.10633
One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the
Externí odkaz:
http://arxiv.org/abs/2305.06986
Autor:
Rachel Isba, Darren M Ashcroft, Alastair D Hay, Elliot Heward, Iain A Bruce, Judith Lunn, John Molloy, Jaya R Nichani, James Birkenshaw-Dempsey
Publikováno v:
BMJ Paediatrics Open, Vol 8, Iss 1 (2024)
Background Acute otitis media with discharge (AOMd) results from a tympanic membrane perforation secondary to a middle ear infection. Currently, the impact of AOMd on children and young people (CYP) and their families is not well understood. There is
Externí odkaz:
https://doaj.org/article/ab4dada39532471385eca54019b88d45
Autor:
Anurag Satpathy, Vishakha Grover, Ashish Kumar, Ashish Jain, Dharmarajan Gopalakrishnan, Harpreet Singh Grover, Abhay Kolte, Anil Melath, Manish Khatri, Nitin Dani, Roshani Thakur, Vaibhav Tiwari, Vikender Singh Yadav, Biju Thomas, Gurparkash Singh Chahal, Meenu Taneja Bhasin, Nymphea Pandit, Sandeep Anant Lawande, R. G. Shiva Manjunath, Surinder Sachdeva, Amit Bhardwaj, Avni Raju Pradeep, Ashish Sham Nichani, Baljeet Singh, P. R. Ganesh, Neeraj Chandrahas Deshpande, Saravanan Sampoornam Pape Reddy, Subash Chandra Raj
Publikováno v:
Journal of Indian Society of Periodontology, Vol 28, Iss 1, Pp 6-31 (2024)
Current implant therapy is a frequently employed treatment for individuals who have lost teeth, as it offers functional and biological advantages over old prostheses. Concurrently, active exploration of intervention strategies aims to prevent the pro
Externí odkaz:
https://doaj.org/article/2f0e4b4616304980b18ca4375c6c8c0e
Traditional analyses of gradient descent show that when the largest eigenvalue of the Hessian, also known as the sharpness $S(\theta)$, is bounded by $2/\eta$, training is "stable" and the training loss decreases monotonically. Recent works, however,
Externí odkaz:
http://arxiv.org/abs/2209.15594