Výsledky vyhledávání

Report

The FIGNEWS Shared Task on News Media Narratives

Autor: Zaghouani, Wajdi, Jarrar, Mustafa, Habash, Nizar, Bouamor, Houda, Zitouni, Imed, Diab, Mona, El-Beltagy, Samhaa R., AbuOdeh, Muhammed

We present an overview of the FIGNEWS shared task, organized as part of the ArabicNLP 2024 conference co-located with ACL 2024. The shared task addresses bias and propaganda annotation in multilingual news posts. We focus on the early days of the Isr

Externí odkaz: http://arxiv.org/abs/2407.18147

Zobrazit plný text záznamu

Report

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

Autor: Abdul-Mageed, Muhammad, Keleg, Amr, Elmadany, AbdelRahim, Zhang, Chiyu, Hamed, Injy, Magdy, Walid, Bouamor, Houda, Habash, Nizar

We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions t

Externí odkaz: http://arxiv.org/abs/2407.04910

Zobrazit plný text záznamu

Report

Strategies for Arabic Readability Modeling

Autor: Liberato, Juan Piñeros, Alhafni, Bashar, Khalil, Muhamed Al, Habash, Nizar

Automatic readability assessment is relevant to building NLP applications for education, content analysis, and accessibility. However, Arabic readability assessment is a challenging task due to Arabic's morphological richness and limited readability

Externí odkaz: http://arxiv.org/abs/2407.03032

Zobrazit plný text záznamu

Report

Exploiting Dialect Identification in Automatic Dialectal Text Normalization

Autor: Alhafni, Bashar, Al-Towaity, Sarah, Fawzy, Ziyad, Nassar, Fatema, Eryani, Fadhl, Bouamor, Houda, Habash, Nizar

Dialectal Arabic is the primary spoken language used by native Arabic speakers in daily communication. The rise of social media platforms has notably expanded its use as a written language. However, Arabic dialects do not have standard orthographies.

Externí odkaz: http://arxiv.org/abs/2407.03020

Zobrazit plný text záznamu

Report

Arabic Diacritics in the Wild: Exploiting Opportunities for Improved Diacritization

Autor: Elgamal, Salman, Obeid, Ossama, Kabbani, Tameem, Inoue, Go, Habash, Nizar

The widespread absence of diacritical marks in Arabic text poses a significant challenge for Arabic natural language processing (NLP). This paper explores instances of naturally occurring diacritics, referred to as "diacritics in the wild," to unveil

Externí odkaz: http://arxiv.org/abs/2406.05760

Zobrazit plný text záznamu

Report

The SAMER Arabic Text Simplification Corpus

Autor: Alhafni, Bashar, Hazim, Reem, Liberato, Juan Piñeros, Khalil, Muhamed Al, Habash, Nizar

We present the SAMER Corpus, the first manually annotated Arabic parallel corpus for text simplification targeting school-aged learners. Our corpus comprises texts of 159K words selected from 15 publicly available Arabic fiction novels most of which

Externí odkaz: http://arxiv.org/abs/2404.18615

Zobrazit plný text záznamu

Report

Can a Multichoice Dataset be Repurposed for Extractive Question Answering?

Autor: Lynn, Teresa, Altakrori, Malik H., Magdy, Samar Mohamed, Das, Rocktim Jyoti, Lyu, Chenyang, Nasr, Mohamed, Samih, Younes, Aji, Alham Fikri, Nakov, Preslav, Godbole, Shantanu, Roukos, Salim, Florian, Radu, Habash, Nizar

The rapid evolution of Natural Language Processing (NLP) has favored major languages such as English, leaving a significant gap for many others due to limited resources. This is especially evident in the context of data annotation, a task whose impor

Externí odkaz: http://arxiv.org/abs/2404.17342

Zobrazit plný text záznamu

Report

SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection

Autor: Wang, Yuxia, Mansurov, Jonibek, Ivanov, Petar, Su, Jinyan, Shelmanov, Artem, Tsvigun, Akim, Afzal, Osama Mohammed, Mahmoud, Tarek, Puccetti, Giovanni, Arnold, Thomas, Whitehouse, Chenxi, Aji, Alham Fikri, Habash, Nizar, Gurevych, Iryna, Nakov, Preslav

Publikováno v: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

We present the results and the main findings of SemEval-2024 Task 8: Multigenerator, Multidomain, and Multilingual Machine-Generated Text Detection. The task featured three subtasks. Subtask A is a binary classification task determining whether a tex

Externí odkaz: http://arxiv.org/abs/2404.14183

Zobrazit plný text záznamu

Report

ZAEBUC-Spoken: A Multilingual Multidialectal Arabic-English Speech Corpus

Autor: Hamed, Injy, Eryani, Fadhl, Palfreyman, David, Habash, Nizar

We present ZAEBUC-Spoken, a multilingual multidialectal Arabic-English speech corpus. The corpus comprises twelve hours of Zoom meetings involving multiple speakers role-playing a work situation where Students brainstorm ideas for a certain topic and

Externí odkaz: http://arxiv.org/abs/2403.18182

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání