Overview of the progression of state-of-the-art language models.

Autor: Briouya, Asmae, Briouya, Hasnae, Choukri, Ali
Předmět:
Zdroj: Telkomnika; Aug2024, Vol. 22 Issue 4, p897-909, 13p
Abstrakt: This review provides a concise overview of key transformer-based language models, including bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 3 (GPT-3), robustly optimized BERT pretraining approach (RoBERTa), a lite BERT (ALBERT), text-to-text transfer transformer (T5), generative pre-trained transformer 4 (GPT-4), and extra large neural network (XLNet). These models have significantly advanced natural language processing (NLP) capabilities, each bringing unique contributions to the field. We delve into BERT’s bidirectional context understanding, GPT-3’s versatility with 175 billion parameters, and RoBERTa’s optimization of BERT. ALBERT emphasizes model efficiency, T5 introduces a text-to-text framework, and GPT-4, with 170 trillion parameters, excels in multimodal tasks. Safety considerations are highlighted, especially in GPT-4. Additionally, XLNet’s permutation-based training achieves bidirectional context understanding. The motivations, advancements, and challenges of these models are explored, offering insights into the evolving landscape of large-scale language models. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index