Zobrazeno 1 - 10
of 3 784
pro vyhledávání: '"Jarrar, A."'
Autor:
Junior, Sylvio Barbon, Ceravolo, Paolo, Groppe, Sven, Jarrar, Mustafa, Maghool, Samira, Sèdes, Florence, Sahri, Soror, Van Keulen, Maurice
A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency a
Externí odkaz:
http://arxiv.org/abs/2406.06596
Autor:
Jarrar, Mustafa, Hammouda, Tymaa
We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons. Furthermor
Externí odkaz:
http://arxiv.org/abs/2406.06598
Semantic textual relatedness is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts. This notion of relatedness can be applied in various
Externí odkaz:
http://arxiv.org/abs/2405.00659
Large Language Models (LLMs), such as GPT-3 and BERT, reshape how textual content is written and communicated. These models have the potential to generate scientific content that is indistinguishable from that written by humans. Hence, LLMs carry sev
Externí odkaz:
http://arxiv.org/abs/2403.00828
This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 3
Externí odkaz:
http://arxiv.org/abs/2310.19034
SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and senses are a
Externí odkaz:
http://arxiv.org/abs/2310.19029
Autor:
Liqreina, Haneen, Jarrar, Mustafa, Khalilia, Mohammed, El-Shangiti, Ahmed Oumar, Abdul-Mageed, Muhammad
Traditional NER systems are typically trained to recognize coarse-grained entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level subtypes. This article aims to advance Arabic NER with fine-grained e
Externí odkaz:
http://arxiv.org/abs/2310.17333
This paper presents Nabra, a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies a
Externí odkaz:
http://arxiv.org/abs/2310.17315
Autor:
Jarrar, Mustafa, Abdul-Mageed, Muhammad, Khalilia, Mohammed, Talafha, Bashar, Elmadany, AbdelRahim, Hamad, Nagham, Omar, Alaa'
We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful
Externí odkaz:
http://arxiv.org/abs/2310.16153
Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive language corpus in Hebrew. A total of 15,881 tweets were retrieved fro
Externí odkaz:
http://arxiv.org/abs/2309.02724