Zobrazeno 1 - 10
of 1 137
pro vyhledávání: '"WARD, RACHEL"'
We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank-$r$ matrix $\mathbf{A}\in\mathbb{R}^{m\times n}$, we prove that gradient descen
Externí odkaz:
http://arxiv.org/abs/2410.09640
Autor:
Abdin, Marah, Aneja, Jyoti, Awadalla, Hany, Awadallah, Ahmed, Awan, Ammar Ahmad, Bach, Nguyen, Bahree, Amit, Bakhtiari, Arash, Bao, Jianmin, Behl, Harkirat, Benhaim, Alon, Bilenko, Misha, Bjorck, Johan, Bubeck, Sébastien, Cai, Martin, Cai, Qin, Chaudhary, Vishrav, Chen, Dong, Chen, Dongdong, Chen, Weizhu, Chen, Yen-Chun, Chen, Yi-Ling, Cheng, Hao, Chopra, Parul, Dai, Xiyang, Dixon, Matthew, Eldan, Ronen, Fragoso, Victor, Gao, Jianfeng, Gao, Mei, Gao, Min, Garg, Amit, Del Giorno, Allie, Goswami, Abhishek, Gunasekar, Suriya, Haider, Emman, Hao, Junheng, Hewett, Russell J., Hu, Wenxiang, Huynh, Jamie, Iter, Dan, Jacobs, Sam Ade, Javaheripi, Mojan, Jin, Xin, Karampatziakis, Nikos, Kauffmann, Piero, Khademi, Mahoud, Kim, Dongwoo, Kim, Young Jin, Kurilenko, Lev, Lee, James R., Lee, Yin Tat, Li, Yuanzhi, Li, Yunsheng, Liang, Chen, Liden, Lars, Lin, Xihui, Lin, Zeqi, Liu, Ce, Liu, Liyuan, Liu, Mengchen, Liu, Weishung, Liu, Xiaodong, Luo, Chong, Madan, Piyush, Mahmoudzadeh, Ali, Majercak, David, Mazzola, Matt, Mendes, Caio César Teodoro, Mitra, Arindam, Modi, Hardik, Nguyen, Anh, Norick, Brandon, Patra, Barun, Perez-Becker, Daniel, Portet, Thomas, Pryzant, Reid, Qin, Heyang, Radmilac, Marko, Ren, Liliang, de Rosa, Gustavo, Rosset, Corby, Roy, Sambudha, Ruwase, Olatunji, Saarikivi, Olli, Saied, Amin, Salim, Adil, Santacroce, Michael, Shah, Shital, Shang, Ning, Sharma, Hiteshi, Shen, Yelong, Shukla, Swadheen, Song, Xia, Tanaka, Masahiro, Tupini, Andrea, Vaddamanu, Praneetha, Wang, Chunyu, Wang, Guanhua, Wang, Lijuan, Wang, Shuohang, Wang, Xin, Wang, Yu, Ward, Rachel, Wen, Wen, Witte, Philipp, Wu, Haiping, Wu, Xiaoxia, Wyatt, Michael, Xiao, Bin, Xu, Can, Xu, Jiahang, Xu, Weijian, Xue, Jilong, Yadav, Sonali, Yang, Fan, Yang, Jianwei, Yang, Yifan, Yang, Ziyi, Yu, Donghan, Yuan, Lu, Zhang, Chenruidong, Zhang, Cyril, Zhang, Jianwen, Zhang, Li Lyna, Zhang, Yi, Zhang, Yue, Zhang, Yunan, Zhou, Xiren
We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi
Externí odkaz:
http://arxiv.org/abs/2404.14219
Autor:
Hasani, Erisa, Ward, Rachel A.
Numerous developments in the recent literature show the promising potential of deep learning in obtaining numerical solutions to partial differential equations (PDEs) beyond the reach of current numerical solvers. However, data-driven neural operator
Externí odkaz:
http://arxiv.org/abs/2401.02398
Autor:
Liu, Bingbin, Bubeck, Sebastien, Eldan, Ronen, Kulkarni, Janardhan, Li, Yuanzhi, Nguyen, Anh, Ward, Rachel, Zhang, Yi
Small-scale models offer various computational advantages, and yet to which extent size is critical for problem-solving abilities remains an open question. Specifically for solving grade school math, the smallest model size so far required to break t
Externí odkaz:
http://arxiv.org/abs/2312.09241
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering
Despite the empirical success and practical significance of (relational) knowledge distillation that matches (the relations of) features between teacher and student models, the corresponding theoretical interpretations remain limited for various know
Externí odkaz:
http://arxiv.org/abs/2307.11030
Autor:
Ward, Rachel, Kolda, Tamara G.
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank-$r$ matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, $T = C (\frac{\sigma_1(\mathbf{A})}{\sigma_r(\
Externí odkaz:
http://arxiv.org/abs/2305.06927
Publikováno v:
Information and Inference: A Journal of the IMA, Volume 13, Issue 3, September 2024, iaae022
Overparameterized models may have many interpolating solutions; implicit regularization refers to the hidden preference of a particular optimization method towards a certain interpolating solution among the many. A by now established line of work has
Externí odkaz:
http://arxiv.org/abs/2305.05448
We give Hoeffding and Bernstein-type concentration inequalities for the largest eigenvalue of sums of random matrices arising from a Markov chain. We consider time-dependent matrix-valued functions on a general state space, generalizing previous that
Externí odkaz:
http://arxiv.org/abs/2303.02150