Výsledky vyhledávání

Report

Are Large Language Models the New Interface for Data Pipelines?

Autor: Junior, Sylvio Barbon, Ceravolo, Paolo, Groppe, Sven, Jarrar, Mustafa, Maghool, Samira, Sèdes, Florence, Sahri, Soror, Van Keulen, Maurice

A Language Model is a term that encompasses various types of models designed to understand and generate human communication. Large Language Models (LLMs) have gained significant attention due to their ability to process text with human-like fluency a

Externí odkaz: http://arxiv.org/abs/2406.06596

Zobrazit plný text záznamu

Report

Qabas: An Open-Source Arabic Lexicographic Database

Autor: Jarrar, Mustafa, Hammouda, Tymaa

We present Qabas, a novel open-source Arabic lexicon designed for NLP applications. The novelty of Qabas lies in its synthesis of 110 lexicons. Specifically, Qabas lexical entries (lemmas) are assembled by linking lemmas from 110 lexicons. Furthermor

Externí odkaz: http://arxiv.org/abs/2406.06598

Zobrazit plný text záznamu

Report

NLU-STR at SemEval-2024 Task 1: Generative-based Augmentation and Encoder-based Scoring for Semantic Textual Relatedness

Autor: Malaysha, Sanad, Jarrar, Mustafa, Khalilia, Mohammed

Semantic textual relatedness is a broader concept of semantic similarity. It measures the extent to which two chunks of text convey similar meaning or topics, or share related concepts or contexts. This notion of relatedness can be applied in various

Externí odkaz: http://arxiv.org/abs/2405.00659

Zobrazit plný text záznamu

Report

Deep Learning Detection Method for Large Language Models-Generated Scientific Content

Autor: Alhijawi, Bushra, Jarrar, Rawan, AbuAlRub, Aseel, Bader, Arwa

Large Language Models (LLMs), such as GPT-3 and BERT, reshape how textual content is written and communicated. These models have the potential to generate scientific content that is indistinguishable from that written by humans. Hence, LLMs carry sev

Externí odkaz: http://arxiv.org/abs/2403.00828

Zobrazit plný text záznamu

Report

ArBanking77: Intent Detection Neural Model and a New Dataset in Modern and Dialectical Arabic

Autor: Jarrar, Mustafa, Birim, Ahmet, Khalilia, Mohammed, Erden, Mustafa, Ghanem, Sana

This paper presents the ArBanking77, a large Arabic dataset for intent detection in the banking domain. Our dataset was arabized and localized from the original English Banking77 dataset, which consists of 13,083 queries to ArBanking77 dataset with 3

Externí odkaz: http://arxiv.org/abs/2310.19034

Zobrazit plný text záznamu

Report

SALMA: Arabic Sense-Annotated Corpus and WSD Benchmarks

Autor: Jarrar, Mustafa, Malaysha, Sanad, Hammouda, Tymaa, Khalilia, Mohammed

SALMA, the first Arabic sense-annotated corpus, consists of ~34K tokens, which are all sense-annotated. The corpus is annotated using two different sense inventories simultaneously (Modern and Ghani). SALMA novelty lies in how tokens and senses are a

Externí odkaz: http://arxiv.org/abs/2310.19029

Zobrazit plný text záznamu

Report

Arabic Fine-Grained Entity Recognition

Autor: Liqreina, Haneen, Jarrar, Mustafa, Khalilia, Mohammed, El-Shangiti, Ahmed Oumar, Abdul-Mageed, Muhammad

Traditional NER systems are typically trained to recognize coarse-grained entities, and less attention is given to classifying entities into a hierarchy of fine-grained lower-level subtypes. This article aims to advance Arabic NER with fine-grained e

Externí odkaz: http://arxiv.org/abs/2310.17333

Zobrazit plný text záznamu

Report

Nabra: Syrian Arabic Dialects with Morphological Annotations

Autor: Nayouf, Amal, Hammouda, Tymaa, Jarrar, Mustafa, Zaraket, Fadi, Kurdy, Mohamad-Bassam

This paper presents Nabra, a corpora of Syrian Arabic dialects with morphological annotations. A team of Syrian natives collected more than 6K sentences containing about 60K words from several sources including social media posts, scripts of movies a

Externí odkaz: http://arxiv.org/abs/2310.17315

Zobrazit plný text záznamu

Report

WojoodNER 2023: The First Arabic Named Entity Recognition Shared Task

Autor: Jarrar, Mustafa, Abdul-Mageed, Muhammad, Khalilia, Mohammed, Talafha, Bashar, Elmadany, AbdelRahim, Hamad, Nagham, Omar, Alaa'

We present WojoodNER-2023, the first Arabic Named Entity Recognition (NER) Shared Task. The primary focus of WojoodNER-2023 is on Arabic NER, offering novel NER datasets (i.e., Wojood) and the definition of subtasks designed to facilitate meaningful

Externí odkaz: http://arxiv.org/abs/2310.16153

Zobrazit plný text záznamu

Report

Offensive Hebrew Corpus and Detection using BERT

Autor: Hamad, Nagham, Jarrar, Mustafa, Khalilia, Mohammad, Nashif, Nadim

Offensive language detection has been well studied in many languages, but it is lagging behind in low-resource languages, such as Hebrew. In this paper, we present a new offensive language corpus in Hebrew. A total of 15,881 tweets were retrieved fro

Externí odkaz: http://arxiv.org/abs/2309.02724

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání