Zobrazeno 1 - 7
of 7
pro vyhledávání: '"Vakilian, Vala"'
Autor:
Deng, Wenlong, Zhao, Yize, Vakilian, Vala, Chen, Minghui, Li, Xiaoxiao, Thrampoulidis, Christos
Storing open-source fine-tuned models separately introduces redundancy and increases response times in applications utilizing multiple models. Delta-parameter pruning (DPP), particularly the random drop and rescale (DARE) method proposed by Yu et al.
Externí odkaz:
http://arxiv.org/abs/2410.09344
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Next-token prediction (NTP) over large text corpora has become the go-to paradigm to train large language models. Yet, it remains unclear how NTP influences the mapping of linguistic patterns to geometric properties of the resulting model representat
Externí odkaz:
http://arxiv.org/abs/2408.15417
Supervised-contrastive loss (SCL) is an alternative to cross-entropy (CE) for classification tasks that makes use of similarities in the embedding space to allow for richer representations. In this work, we propose methods to engineer the geometry of
Externí odkaz:
http://arxiv.org/abs/2310.00893
Autor:
Kini, Ganesh Ramachandra, Vakilian, Vala, Behnia, Tina, Gill, Jaidev, Thrampoulidis, Christos
Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this
Externí odkaz:
http://arxiv.org/abs/2306.07960
Various logit-adjusted parameterizations of the cross-entropy (CE) loss have been proposed as alternatives to weighted CE for training large models on label-imbalanced data far beyond the zero train error regime. The driving force behind those design
Externí odkaz:
http://arxiv.org/abs/2303.07608
Neural Collapse refers to the remarkable structural properties characterizing the geometry of class embeddings and classifier weights, found by deep nets when trained beyond zero training error. However, this characterization only holds for balanced
Externí odkaz:
http://arxiv.org/abs/2208.05512
Autor:
Kini, Ganesh Ramachandra, Vakilian, Vala, Behnia, Tina, Gill, Jaidev, Thrampoulidis, Christos
Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy (CE) loss for classification. In this paper we ask: what differences in the learning process occur when the two different loss functions are being
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4fedae2c8689b16f31fc955f8c36573f