Výsledky vyhledávání - "Abdelwahab, Hammam"

Report

Data Processing for the OpenGPT-X Model Family

This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs). The project goal is to

Externí odkaz: http://arxiv.org/abs/2410.08800

Zobrazit plný text záznamu

Report

Progress Report: Towards European LLMs

We present preliminary results of the project OpenGPT-X. At present, the project has developed two multilingual LLMs designed to embrace Europe's linguistic diversity by supporting all 24 official languages of the European Union. Trained on a dataset

Externí odkaz: http://arxiv.org/abs/2410.03730

Zobrazit plný text záznamu

Report

Tokenizer Choice For LLM Training: Negligible or Crucial?

The recent success of Large Language Models (LLMs) has been predominantly driven by curating the training dataset composition, scaling of model architectures and dataset sizes and advancements in pretraining objectives, leaving tokenizer influence as

Externí odkaz: http://arxiv.org/abs/2310.08754

Zobrazit plný text záznamu

Evaluation of Drift Detection Techniques for Automated Machine Learning Pipelines

Autor: Abdelwahab, Hammam

Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of inp

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::f30cffde6f12ef6e10020891f4b2fbf7

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání