Výsledky vyhledávání

Report

Sampling Strategies for Creation of a Benchmark for Dialectal Sentiment Classification

Autor: Srirag, Dipankar, Painter, Jordan, Joshi, Aditya, Kanojia, Diptesh

This paper investigates data sampling strategies to create a benchmark for dialectal sentiment classification of Google Places reviews written in English. Based on location-based filtering, we collect a self-supervised dataset of reviews in Australia

Externí odkaz: http://arxiv.org/abs/2410.11216

Zobrazit plný text záznamu

Report

Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?

Autor: Qian, Shenbin, Orăsan, Constantin, Kanojia, Diptesh, Carmo, Félix do

This paper investigates whether large language models (LLMs) are state-of-the-art quality estimators for machine translation of user-generated content (UGC) that contains emotional expressions, without the use of reference translations. To achieve th

Externí odkaz: http://arxiv.org/abs/2410.06338

Zobrazit plný text záznamu

Report

Edit Distances and Their Applications to Downstream Tasks in Research and Commercial Contexts

Autor: Carmo, Félix do, Kanojia, Diptesh

The tutorial describes the concept of edit distances applied to research and commercial contexts. We use Translation Edit Rate (TER), Levenshtein, Damerau-Levenshtein, Longest Common Subsequence and $n$-gram distances to demonstrate the frailty of st

Externí odkaz: http://arxiv.org/abs/2410.05881

Zobrazit plný text záznamu

Report

A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content

Autor: Qian, Shenbin, Orăsan, Constantin, Kanojia, Diptesh, Carmo, Félix do

Machine translation (MT) of user-generated content (UGC) poses unique challenges, including handling slang, emotion, and literary devices like irony and sarcasm. Evaluating the quality of these translations is challenging as current metrics do not fo

Externí odkaz: http://arxiv.org/abs/2410.03277

Zobrazit plný text záznamu

Report

What do Large Language Models Need for Machine Translation Evaluation?

Autor: Qian, Shenbin, Sindhujan, Archchana, Kabra, Minnie, Kanojia, Diptesh, Orăsan, Constantin, Ranasinghe, Tharindu, Blain, Frédéric

Leveraging large language models (LLMs) for various natural language processing tasks has led to superlative claims about their performance. For the evaluation of machine translation (MT), existing research shows that LLMs are able to achieve results

Externí odkaz: http://arxiv.org/abs/2410.03278

Zobrazit plný text záznamu

Report

Connecting Ideas in 'Lower-Resource' Scenarios: NLP for National Varieties, Creoles and Other Low-resource Scenarios

Autor: Joshi, Aditya, Kanojia, Diptesh, Lent, Heather, Kaing, Hour, Song, Haiyue

Despite excellent results on benchmarks over a small subset of languages, large language models struggle to process text from languages situated in `lower-resource' scenarios such as dialects/sociolects (national or social varieties of a language), C

Externí odkaz: http://arxiv.org/abs/2409.12683

Zobrazit plný text záznamu

Report

AV-GS: Learning Material and Geometry Aware Priors for Novel View Acoustic Synthesis

Autor: Bhosale, Swapnil, Yang, Haosen, Kanojia, Diptesh, Deng, Jiankang, Zhu, Xiatian

Novel view acoustic synthesis (NVAS) aims to render binaural audio at any target viewpoint, given a mono audio emitted by a sound source at a 3D scene. Existing methods have proposed NeRF-based implicit models to exploit visual cues as a condition fo

Externí odkaz: http://arxiv.org/abs/2406.08920

Zobrazit plný text záznamu

Report

Unsupervised Audio-Visual Segmentation with Modality Alignment

Autor: Bhosale, Swapnil, Yang, Haosen, Kanojia, Diptesh, Deng, Jiangkang, Zhu, Xiatian

Audio-Visual Segmentation (AVS) aims to identify, at the pixel level, the object in a visual scene that produces a given sound. Current AVS methods rely on costly fine-grained annotations of mask-audio pairs, making them impractical for scalability.

Externí odkaz: http://arxiv.org/abs/2403.14203

Zobrazit plný text záznamu

Report

Google Translate Error Analysis for Mental Healthcare Information: Evaluating Accuracy, Comprehensibility, and Implications for Multilingual Healthcare Communication

Autor: Delfani, Jaleh, Orasan, Constantin, Saadany, Hadeel, Temizoz, Ozlem, Taylor-Stilgoe, Eleanor, Kanojia, Diptesh, Braun, Sabine, Schouten, Barbara

This study explores the use of Google Translate (GT) for translating mental healthcare (MHealth) information and evaluates its accuracy, comprehensibility, and implications for multilingual healthcare communication through analysing GT output in the

Externí odkaz: http://arxiv.org/abs/2402.04023

Zobrazit plný text záznamu

Report

Airavata: Introducing Hindi Instruction-tuned LLM

Autor: Gala, Jay, Jayakumar, Thanmay, Husain, Jaavid Aktar, M, Aswanth Kumar, Khan, Mohammed Safi Ur Rahman, Kanojia, Diptesh, Puduppully, Ratish, Khapra, Mitesh M., Dabre, Raj, Murthy, Rudra, Kunchukuttan, Anoop

We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi. Airavata was created by fine-tuning OpenHathi with diverse, instruction-tuning Hindi datasets to make it better suited for assistive tasks. Along with the model, we al

Externí odkaz: http://arxiv.org/abs/2401.15006

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání