Výsledky vyhledávání

Report

CLAIR-A: Leveraging Large Language Models to Judge Audio Captions

Autor: Wu, Tsung-Han, Gonzalez, Joseph E., Darrell, Trevor, Chan, David M.

The Automated Audio Captioning (AAC) task asks models to generate natural language descriptions of an audio input. Evaluating these machine-generated audio captions is a complex task that requires considering diverse factors, among them, auditory sce

Externí odkaz: http://arxiv.org/abs/2409.12962

Zobrazit plný text záznamu

Report

An Efficient Self-Learning Framework For Interactive Spoken Dialog Systems

Autor: Tulsiani, Hitesh, Chan, David M., Ghosh, Shalini, Lalwani, Garima, Pandey, Prabhat, Bansal, Ankish, Garimella, Sri, Rastrow, Ariya, Hoffmeister, Björn

Dialog systems, such as voice assistants, are expected to engage with users in complex, evolving conversations. Unfortunately, traditional automatic speech recognition (ASR) systems deployed in such applications are usually trained to recognize each

Externí odkaz: http://arxiv.org/abs/2409.10515

Zobrazit plný text záznamu

Report

Rediscovering the Latent Dimensions of Personality with Large Language Models as Trait Descriptors

Autor: Suh, Joseph, Moon, Suhong, Kang, Minwoo, Chan, David M.

Assessing personality traits using large language models (LLMs) has emerged as an interesting and challenging area of research. While previous methods employ explicit questionnaires, often derived from the Big Five model of personality, we hypothesiz

Externí odkaz: http://arxiv.org/abs/2409.09905

Zobrazit plný text záznamu

Report

Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark

Autor: Wu, Tsung-Han, Biamby, Giscard, Quenum, Jerome, Gupta, Ritwik, Gonzalez, Joseph E., Darrell, Trevor, Chan, David M.

Large Multimodal Models (LMMs) have made significant strides in visual question-answering for single images. Recent advancements like long-context LMMs have allowed them to ingest larger, or even multiple, images. However, the ability to process a la

Externí odkaz: http://arxiv.org/abs/2407.13766

Zobrazit plný text záznamu

Report

Virtual Personas for Language Models via an Anthology of Backstories

Autor: Moon, Suhong, Abdulhai, Marwa, Kang, Minwoo, Suh, Joseph, Soedarmadji, Widyadewi, Behar, Eran Kohen, Chan, David M.

Large language models (LLMs) are trained from vast repositories of text authored by millions of distinct authors, reflecting an enormous diversity of human traits. While these models bear the potential to be used as approximations of human subjects i

Externí odkaz: http://arxiv.org/abs/2407.06576

Zobrazit plný text záznamu

Report

The Galois-equivariant $K$-theory of finite fields

Autor: Chan, David, Vogeli, Chase

We compute the $RO(G)$-graded equivariant algebraic $K$-groups of a finite field with an action by its Galois group $G$. Specifically, we show these $K$-groups split as the sum of an explicitly computable term and the well-studied $RO(G)$-graded coef

Externí odkaz: http://arxiv.org/abs/2406.19481

Zobrazit plný text záznamu

Akademický článek

Advances in silicon photonics for high-capacity optical interconnects - INVITED

Autor: Tsang Hon Ki, Yi Dan, Zhou Xuetong, Chan David Weng U.

Publikováno v: EPJ Web of Conferences, Vol 287, p 01008 (2023)

We review our recent progress on advanced silicon photonic devices and photonic circuits, including advanced grating couplers, modulators, mode and polarization division multiplexing and integrated optical signal processors for use in high capacity d

Externí odkaz: https://doaj.org/article/aefffa8c793a4fb493f464a1a74666e0

Zobrazit plný text záznamu

Report

ALOHa: A New Measure for Hallucination in Captioning Models

Autor: Petryk, Suzanne, Chan, David M., Kachinthaya, Anish, Zou, Haodi, Canny, John, Gonzalez, Joseph E., Darrell, Trevor

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene. The existing prominent metric for object hallucination,

Externí odkaz: http://arxiv.org/abs/2404.02904

Zobrazit plný text záznamu

Report

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

Autor: Jain, Yash, Chan, David, Dheram, Pranav, Khare, Aparna, Shonibare, Olabanji, Ravichandran, Venkatesh, Ghosh, Shalini

Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks. Existing multi

Externí odkaz: http://arxiv.org/abs/2403.19822

Zobrazit plný text záznamu

Report

Design and Visual Servoing Control of a Hybrid Dual-Segment Flexible Neurosurgical Robot for Intraventricular Biopsy

Autor: Chen, Jian, Chen, Mingcong, Zhao, Qingxiang, Wang, Shuai, Wang, Yihe, Xiao, Ying, Hu, Jian, Chan, Danny Tat Ming, Yeung, Kam Tong Leo, Chan, David Yuen Chung, Liu, Hongbin

Publikováno v: IEEE International Conference on Robotics & Automation, 2024

Traditional rigid endoscopes have challenges in flexibly treating tumors located deep in the brain, and low operability and fixed viewing angles limit its development. This study introduces a novel dual-segment flexible robotic endoscope MicroNeuro,

Externí odkaz: http://arxiv.org/abs/2402.09679

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání