Zobrazeno 1 - 10
of 38
pro vyhledávání: '"Neill, James O'"'
Autor:
Neill, James O', Dutta, Sourav
Fine-tuning pretrained self-supervised language models is widely adopted for transfer learning to downstream tasks. Fine-tuning can be achieved by freezing gradients of the pretrained network and only updating gradients of a newly added classificatio
Externí odkaz:
http://arxiv.org/abs/2307.10098
Autor:
Neill, James O', Dutta, Sourav
We investigate the effects of post-training quantization and quantization-aware training on the generalization of Transformer language models. We present a new method called self-distilled quantization (SDQ) that minimizes accumulative quantization e
Externí odkaz:
http://arxiv.org/abs/2307.05972
While various avenues of research have been explored for iterative pruning, little is known what effect pruning has on zero-shot test performance and its potential implications on the choice of pruning criteria. This pruning setup is particularly imp
Externí odkaz:
http://arxiv.org/abs/2204.01385
Pruning aims to reduce the number of parameters while maintaining performance close to the original network. This work proposes a novel \emph{self-distillation} based pruning strategy, whereby the representational similarity between the pruned and un
Externí odkaz:
http://arxiv.org/abs/2109.15014
Autor:
Neill, James O', Bollegala, Danushka
Negative sampling is a limiting factor w.r.t. the generalization of metric-learned neural networks. We show that uniform negative sampling provides little information about the class boundaries and thus propose three novel techniques for efficient ne
Externí odkaz:
http://arxiv.org/abs/2102.06603
Autor:
Neill, James O', Bollegala, Danushka
Multi-step ahead prediction in language models is challenging due to the discrepancy between training and test time processes. At test time, a sequence predictor is required to make predictions given past predictions as the input, instead of the past
Externí odkaz:
http://arxiv.org/abs/2101.09313
This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number
Externí odkaz:
http://arxiv.org/abs/2007.14917
Autor:
Neill, James O'
Overparameterized networks trained to convergence have shown impressive performance in domains such as computer vision and natural language processing. Pushing state of the art on salient tasks within these domains corresponds to these models becomin
Externí odkaz:
http://arxiv.org/abs/2006.03669
Autor:
Neill, James O', Bollegala, Danushka
Task-specific scores are often used to optimize for and evaluate the performance of conditional text generation systems. However, such scores are non-differentiable and cannot be used in the standard supervised learning paradigm. Hence, policy gradie
Externí odkaz:
http://arxiv.org/abs/1909.03622
Autor:
Neill, James O', Bollegala, Danushka
We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance. Instead of minimizing measures between the predicted
Externí odkaz:
http://arxiv.org/abs/1901.07002