Výsledky vyhledávání - "Rudovic, Oggi"

Report

Multimodal Large Language Models with Fusion Low Rank Adaptation for Device Directed Speech Detection

Autor: Palaskar, Shruti, Rudovic, Oggi, Dharur, Sameer, Pesce, Florian, Krishna, Gautam, Sivaraman, Aswin, Berkowitz, Jack, Abdelaziz, Ahmed Hussen, Adya, Saurabh, Tewfik, Ahmed

Although Large Language Models (LLMs) have shown promise for human-like conversations, they are primarily pre-trained on text data. Incorporating audio or video improves performance, but collecting large-scale multimodal data and pre-training multimo

Externí odkaz: http://arxiv.org/abs/2406.09617

Zobrazit plný text záznamu

Report

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

Autor: Krishna, Gautam, Dharur, Sameer, Rudovic, Oggi, Dighe, Pranay, Adya, Saurabh, Abdelaziz, Ahmed Hussen, Tewfik, Ahmed H

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech. State-of-the-art DDSD systems use verbal cues, e.g acoustic, text

Externí odkaz: http://arxiv.org/abs/2310.15261

Zobrazit plný text záznamu

Report

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

Autor: Dighe, Pranay, Nayak, Prateeth, Rudovic, Oggi, Marchi, Erik, Niu, Xiaochuan, Tewfik, Ahmed

Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e.g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to

Externí odkaz: http://arxiv.org/abs/2210.12134

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání