Zobrazeno 1 - 10
of 1 783
pro vyhledávání: '"Dahou A"'
Publikováno v:
Procedia Computer Science, 244, 397-407 (2024)
In the context of low-resource languages, the Algerian dialect (AD) faces challenges due to the absence of annotated corpora, hindering its effective processing, notably in Machine Learning (ML) applications reliant on corpora for training and assess
Externí odkaz:
http://arxiv.org/abs/2411.04604
Autor:
Maniparambil, Mayug, Akshulakov, Raiymbek, Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Singh, Ankit, O'Connor, Noel E.
Recent contrastive multimodal vision-language models like CLIP have demonstrated robust open-world semantic understanding, becoming the standard image backbones for vision-language applications due to their aligned latent space. However, this practic
Externí odkaz:
http://arxiv.org/abs/2409.19425
Autor:
Malartic, Quentin, Chowdhury, Nilabhra Roy, Cojocaru, Ruxandra, Farooq, Mugariya, Campesan, Giulia, Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Singh, Ankit, Velikanov, Maksim, Boussaha, Basma El Amel, Al-Yafeai, Mohammed, Alobeidli, Hamza, Qadi, Leen Al, Seddik, Mohamed El Amine, Fedyanin, Kirill, Alami, Reda, Hacid, Hakim
We introduce Falcon2-11B, a foundation model trained on over five trillion tokens, and its multimodal counterpart, Falcon2-11B-vlm, which is a vision-to-text model. We report our findings during the training of the Falcon2-11B which follows a multi-s
Externí odkaz:
http://arxiv.org/abs/2407.14885
Autor:
Narayan, Sanath, Djilali, Yasser Abdelaziz Dahou, Singh, Ankit, Bihan, Eustache Le, Hacid, Hakim
This work presents an extensive and detailed study on Audio-Visual Speech Recognition (AVSR) for five widely spoken languages: Chinese, Spanish, English, Arabic, and French. We have collected large-scale datasets for each language except for English,
Externí odkaz:
http://arxiv.org/abs/2406.00038
Autor:
Sihler, Florian, Pietzschmann, Lukas, Straub, Raphael, Tichy, Matthias, Diera, Andor, Dahou, Abdelhalim
CONTEXT The R programming language has a huge and active community, especially in the area of statistical computing. Its interpreted nature allows for several interesting constructs, like the manipulation of functions at run-time, that hinder the sta
Externí odkaz:
http://arxiv.org/abs/2401.16228
Autor:
Maniparambil, Mayug, Akshulakov, Raiymbek, Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Seddik, Mohamed El Amine, Mangalam, Karttikeya, O'Connor, Noel E.
Aligned text-image encoders such as CLIP have become the de facto model for vision-language tasks. Furthermore, modality-specific encoders achieve impressive performances in their respective domains. This raises a central question: does an alignment
Externí odkaz:
http://arxiv.org/abs/2401.05224
Autor:
Mabrouk, Alhassan, Redondo, Rebeca P. Díaz, Dahou, Abdelghani, Elaziz, Mohamed Abd, Kayed, Mohammed
Publikováno v:
Applied Sciences, 2022, vol. 12, no 13, p. 6448
Pneumonia is a life-threatening lung infection resulting from several different viral infections. Identifying and treating pneumonia on chest X-ray images can be difficult due to its similarity to other pulmonary diseases. Thus, the existing methods
Externí odkaz:
http://arxiv.org/abs/2312.07965
Autor:
Mabrouk, Alhassan, Dahou, Abdelghani, Elaziz, Mohamed Abd, Redondo, Rebeca P. Díaz, Kayed, Mohammed
Publikováno v:
Computational Intelligence and Neuroscience, 2022, vol. 2022
The Internet of Medical Things (IoMT) has dramatically benefited medical professionals that patients and physicians can access from all regions. Although the automatic detection and prediction of diseases such as melanoma and leukemia is still being
Externí odkaz:
http://arxiv.org/abs/2312.07437
We present a novel approach for saliency prediction in images, leveraging parallel decoding in transformers to learn saliency solely from fixation maps. Models typically rely on continuous saliency maps, to overcome the difficulty of optimizing for t
Externí odkaz:
http://arxiv.org/abs/2311.14073
Autor:
Djilali, Yasser Abdelaziz Dahou, Narayan, Sanath, Bihan, Eustache Le, Boussaid, Haithem, Almazrouei, Ebtessam, Debbah, Merouane
The Lip Reading Sentences-3 (LRS3) benchmark has primarily been the focus of intense research in visual speech recognition (VSR) during the last few years. As a result, there is an increased risk of overfitting to its excessively used test set, which
Externí odkaz:
http://arxiv.org/abs/2311.14063