Zobrazeno 1 - 1
of 1
pro vyhledávání: '"Teh, Kai Jun"'
Pretraining transformers are generally time-consuming. Fully quantized training (FQT) is a promising approach to speed up pretraining. However, most FQT methods adopt a quantize-compute-dequantize procedure, which often leads to suboptimal speedup an
Externí odkaz:
http://arxiv.org/abs/2403.12422