Zobrazeno 1 - 10
of 23
pro vyhledávání: '"Thakkar, Gaurish"'
Publikováno v:
LREC-COLING 2024 - The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
In recent years, multimodal natural language processing, aimed at learning from diverse data types, has garnered significant attention. However, there needs to be more clarity when it comes to analysing multimodal tasks in multi-lingual contexts. Whi
Externí odkaz:
http://arxiv.org/abs/2404.01753
Publikováno v:
Slavic NLP 2023
This article presents a sentence-level sentiment dataset for the Croatian news domain. In addition to the 3K annotated texts already present, our dataset contains 14.5K annotated sentence occurrences that have been tagged with 5 classes. We provide b
Externí odkaz:
http://arxiv.org/abs/2305.08187
Publikováno v:
LTC 2023
This paper introduces Cro-FiReDa, a sentiment-annotated dataset for Croatian in the domain of movie reviews. The dataset, which contains over 10,000 sentences, has been annotated at the sentence level. In addition to presenting the overall annotation
Externí odkaz:
http://arxiv.org/abs/2305.08173
Autor:
Gottschalk, Simon, Kacupaj, Endri, Abdollahi, Sara, Alves, Diego, Amaral, Gabriel, Koutsiana, Elisavet, Kuculo, Tin, Major, Daniela, Mello, Caio, Cheema, Gullal S., Sittar, Abdul, Swati, Tahmasebzadeh, Golsa, Thakkar, Gaurish
Accessing and understanding contemporary and historical events of global impact such as the US elections and the Olympic Games is a major prerequisite for cross-lingual event analytics that investigate event causes, perception and consequences across
Externí odkaz:
http://arxiv.org/abs/2302.14688
With the ever-growing popularity of the field of NLP, the demand for datasets in low resourced-languages follows suit. Following a previously established framework, in this paper, we present the UNER dataset, a multilingual and hierarchical parallel
Externí odkaz:
http://arxiv.org/abs/2212.07429
This paper presents a corpus annotated for the task of direct-speech extraction in Croatian. The paper focuses on the annotation of the quotation, co-reference resolution, and sentiment annotation in SETimes news corpus in Croatian and on the analysi
Externí odkaz:
http://arxiv.org/abs/2212.07172
This article presents the application of the Universal Named Entity framework to generate automatically annotated corpora. By using a workflow that extracts Wikipedia data and meta-data and DBpedia information, we generated an English dataset which i
Externí odkaz:
http://arxiv.org/abs/2212.07162
Publikováno v:
vol 2829,2021, 76-84
This paper presents a cross-lingual sentiment analysis of news articles using zero-shot and few-shot learning. The study aims to classify the Croatian news articles with positive, negative, and neutral sentiments using the Slovene dataset. The system
Externí odkaz:
http://arxiv.org/abs/2212.07160
This article presents the results of the evaluation campaign of language tools available for fifteen EU-official under-resourced languages. The evaluation was conducted within the MSC ITN CLEOPATRA action that aims at building the cross-lingual event
Externí odkaz:
http://arxiv.org/abs/2010.12428
This article presents the strategy for developing a platform containing Language Processing Chains for European Union languages, consisting of Tokenization to Parsing, also including Named Entity recognition andwith addition ofSentiment Analysis. The
Externí odkaz:
http://arxiv.org/abs/2010.12433