Zobrazeno 1 - 10
of 141
pro vyhledávání: '"Bui, Thang"'
Accurately quantifying uncertainty in large language models (LLMs) is crucial for their reliable deployment, especially in high-stakes applications. Current state-of-the-art methods for measuring semantic uncertainty in LLMs rely on strict bidirectio
Externí odkaz:
http://arxiv.org/abs/2410.22685
Autor:
Bui, Thang D.
Non-Gaussian likelihoods are essential for modelling complex real-world observations but pose significant computational challenges in learning and inference. Even with Gaussian priors, non-Gaussian likelihoods often lead to analytically intractable p
Externí odkaz:
http://arxiv.org/abs/2410.20754
Autor:
O'Neill, Charles, Bui, Thang
This paper introduces an efficient and robust method for discovering interpretable circuits in large language models using discrete sparse autoencoders. Our approach addresses key limitations of existing techniques, namely computational complexity an
Externí odkaz:
http://arxiv.org/abs/2405.12522
In today's rapidly evolving landscape of Artificial Intelligence, large language models (LLMs) have emerged as a vibrant research topic. LLMs find applications in various fields and contribute significantly. Despite their powerful language capabiliti
Externí odkaz:
http://arxiv.org/abs/2404.09296
Neural networks sometimes exhibit grokking, a phenomenon where perfect or near-perfect performance is achieved on a validation set well after the same performance has been obtained on the corresponding training set. In this workshop paper, we introdu
Externí odkaz:
http://arxiv.org/abs/2402.08946
In some settings neural networks exhibit a phenomenon known as \textit{grokking}, where they achieve perfect or near-perfect accuracy on the validation set long after the same performance has been achieved on the training set. In this paper, we disco
Externí odkaz:
http://arxiv.org/abs/2310.17247
Autor:
Nguyen, Tuan Dung, Ting, Yuan-Sen, Ciucă, Ioana, O'Neill, Charlie, Sun, Ze-Chang, Jabłońska, Maja, Kruk, Sandor, Perkowski, Ernest, Miller, Jack, Li, Jason, Peek, Josh, Iyer, Kartheik, Różański, Tomasz, Khetarpal, Pranav, Zaman, Sharaf, Brodrick, David, Méndez, Sergio J. Rodríguez, Bui, Thang, Goodman, Alyssa, Accomazzi, Alberto, Naiman, Jill, Cranney, Jesse, Schawinski, Kevin, UniverseTBD
Large language models excel in many human-language tasks but often falter in highly specialized domains like scholarly astronomy. To bridge this gap, we introduce AstroLLaMA, a 7-billion-parameter model fine-tuned from LLaMA-2 using over 300,000 astr
Externí odkaz:
http://arxiv.org/abs/2309.06126
In this paper, we tackle the emerging challenge of unintended harmful content generation in Large Language Models (LLMs) with a novel dual-stage optimisation technique using adversarial fine-tuning. Our two-pronged approach employs an adversarial mod
Externí odkaz:
http://arxiv.org/abs/2308.13768
Large Language Models (LLMs) hold immense potential to generate synthetic data of high quality and utility, which has numerous applications from downstream model training to practical data utilisation. However, contemporary models, despite their impr
Externí odkaz:
http://arxiv.org/abs/2308.07645
Autor:
Bui, Thang Dinh
Neural networks (NNs) are widely used to investigate the relationship among variables in complex multivariate problems. In cases of limited data, the network behavior strongly depends on factors such as the choice of network activation function and n
Externí odkaz:
http://hdl.handle.net/1969.1/2593