Zobrazeno 1 - 10
of 94 436
pro vyhledávání: '"Nikhil, A."'
While scaling laws provide a reliable methodology for predicting train loss across compute scales for a single data distribution, less is known about how these predictions should change as we change the distribution. In this paper, we derive a strate
Externí odkaz:
http://arxiv.org/abs/2411.12925
Autor:
Jelassi, Samy, Mohri, Clara, Brandfonbrener, David, Gu, Alex, Vyas, Nikhil, Anand, Nikhil, Alvarez-Melis, David, Li, Yuanzhi, Kakade, Sham M., Malach, Eran
The Mixture-of-Experts (MoE) architecture enables a significant increase in the total number of model parameters with minimal computational overhead. However, it is not clear what performance tradeoffs, if any, exist between MoEs and standard dense t
Externí odkaz:
http://arxiv.org/abs/2410.19034
The influence of contextual input on the behavior of large language models (LLMs) has prompted the development of context attribution methods that aim to quantify each context span's effect on an LLM's generations. The leave-one-out (LOO) error, whic
Externí odkaz:
http://arxiv.org/abs/2411.15102
Autor:
Mishra, Nikhil
This work addresses a trust-based enhancement to the Multipath Ad hoc On-Demand Distance Vector (AOMDV) routing protocol. While AODV and its multipath variant AOMDV have been fundamental in mobile ad hoc networks, they lack mechanisms to account for
Externí odkaz:
http://arxiv.org/abs/2411.13227
Galaxies grow and evolve in dark matter halos. Because dark matter is not visible, galaxies' halo masses ($\rm{M}_{\rm{halo}}$) must be inferred indirectly. We present a graph neural network (GNN) model for predicting $\rm{M}_{\rm{halo}}$ from stella
Externí odkaz:
http://arxiv.org/abs/2411.12629
Over an extensive duration, administrators and clinicians have endeavoured to predict Emergency Department (ED) visits with precision, aiming to optimise resource distribution. Despite the proliferation of diverse AI-driven models tailored for precis
Externí odkaz:
http://arxiv.org/abs/2411.11275
Autor:
Ghanathe, Nikhil P, Wilton, Steven J E
TinyML models often operate in remote, dynamic environments without cloud connectivity, making them prone to failures. Ensuring reliability in such scenarios requires not only detecting model failures but also identifying their root causes. However,
Externí odkaz:
http://arxiv.org/abs/2411.10692
Task-oriented dialogue systems rely on predefined conversation schemes (dialogue flows) often represented as directed acyclic graphs. These flows can be manually designed or automatically generated from previously recorded conversations. Due to varia
Externí odkaz:
http://arxiv.org/abs/2411.10416
Intent discovery is crucial for both building new conversational agents and improving existing ones. While several approaches have been proposed for intent discovery, most rely on clustering to group similar utterances together. Traditional evaluatio
Externí odkaz:
http://arxiv.org/abs/2411.09853
Small, highly trained, open-source large language models are widely used due to their inference efficiency, but further improving their quality remains a challenge. Sparse upcycling is a promising approach that transforms a pretrained dense model int
Externí odkaz:
http://arxiv.org/abs/2411.08968