Approximating Two-Layer Feedforward Networks for Efficient Transformers

Autor:	Csordás, Róbert, Irie, Kazuki, Schmidhuber, Jürgen
Rok vydání:	2023
Předmět:	Computer Science - Machine Learning Computer Science - Neural and Evolutionary Computing
Druh dokumentu:	Working Paper
Popis:	How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that unifies various methods to approximate two-layer NNs (e.g., feedforward blocks of Transformers), including product-key memories (PKMs). Leveraging insights from this framework, we propose methods to improve both MoEs and PKMs. Unlike prior work that compares MoEs with dense baselines under the compute-equal condition, our evaluation condition is parameter-equal, which is crucial to properly evaluate LMs. We show that our MoEs are competitive with the dense Transformer-XL on both the WikiText-103 and enwiki8 datasets at two different scales, while being much more resource efficient. This demonstrates that MoEs are relevant not only to extremely large LMs but also to any-scale resource-efficient LMs. Our code is public. Comment: Accepted to EMNLP 2023 Findings
Databáze:	arXiv
Externí odkaz:	http://arxiv.org/abs/2310.10837 Zobrazit plný text záznamu View this record from Arxiv