Výsledky vyhledávání

Report

Novel View Acoustic Parameter Estimation

Autor: Falcon-Perez, Ricardo, Gao, Ruohan, Mueckl, Gregor, Gari, Sebastia V. Amengual, Ananthabhotla, Ishwarya

The task of Novel View Acoustic Synthesis (NVAS) - generating Room Impulse Responses (RIRs) for unseen source and receiver positions in a scene - has recently gained traction, especially given its relevance to Augmented Reality (AR) and Virtual Reali

Externí odkaz: http://arxiv.org/abs/2410.23523

Zobrazit plný text záznamu

Report

DiffSound: Differentiable Modal Sound Rendering and Inverse Rendering for Diverse Inference Tasks

Autor: Jin, Xutong, Xu, Chenxi, Gao, Ruohan, Wu, Jiajun, Wang, Guoping, Li, Sheng

Accurately estimating and simulating the physical properties of objects from real-world sound recordings is of great practical importance in the fields of vision, graphics, and robotics. However, the progress in these directions has been limited -- p

Externí odkaz: http://arxiv.org/abs/2409.13486

Zobrazit plný text záznamu

Report

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos

Autor: Yun, Heeseung, Gao, Ruohan, Ananthabhotla, Ishwarya, Kumar, Anurag, Donley, Jacob, Li, Chao, Kim, Gunhee, Ithapu, Vamsi Krishna, Murdock, Calvin

Egocentric videos provide comprehensive contexts for user and scene understanding, spanning multisensory perception to behavioral interaction. We propose Spherical World-Locking (SWL) as a general framework for egocentric scene representation, which

Externí odkaz: http://arxiv.org/abs/2408.05364

Zobrazit plný text záznamu

Report

Meerkat: Audio-Visual Large Language Model for Grounding in Space and Time

Autor: Chowdhury, Sanjoy, Nag, Sayan, Dasgupta, Subhrajyoti, Chen, Jun, Elhoseiny, Mohamed, Gao, Ruohan, Manocha, Dinesh

Leveraging Large Language Models' remarkable proficiency in text-based tasks, recent works on Multi-modal LLMs (MLLMs) extend them to other modalities like vision and audio. However, the progress in these directions has been mostly focused on tasks t

Externí odkaz: http://arxiv.org/abs/2407.01851

Zobrazit plný text záznamu

Report

Hearing Anything Anywhere

Autor: Wang, Mason, Sawata, Ryosuke, Clarke, Samuel, Gao, Ruohan, Wu, Shangzhe, Wu, Jiajun

Recent years have seen immense progress in 3D computer vision and computer graphics, with emerging tools that can virtualize real-world 3D environments for numerous Mixed Reality (XR) applications. However, alongside immersive visual experiences, imm

Externí odkaz: http://arxiv.org/abs/2406.07532

Zobrazit plný text záznamu

Report

The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective

Autor: Jia, Wenqi, Liu, Miao, Jiang, Hao, Ananthabhotla, Ishwarya, Rehg, James M., Ithapu, Vamsi Krishna, Gao, Ruohan

In recent years, the thriving development of research related to egocentric videos has provided a unique perspective for the study of conversational interactions, where both visual and audio signals play a crucial role. While most prior work focus on

Externí odkaz: http://arxiv.org/abs/2312.12870

Zobrazit plný text záznamu

Report

SoundCam: A Dataset for Finding Humans Using Room Acoustics

Autor: Wang, Mason, Clarke, Samuel, Wang, Jui-Hsien, Gao, Ruohan, Wu, Jiajun

A room's acoustic properties are a product of the room's geometry, the objects within the room, and their specific positions. A room's acoustic properties can be characterized by its impulse response (RIR) between a source and listener location, or r

Externí odkaz: http://arxiv.org/abs/2311.03517

Zobrazit plný text záznamu

Report

NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

Autor: Zhang, Ruohan, Lee, Sharon, Hwang, Minjune, Hiranaka, Ayano, Wang, Chen, Ai, Wensi, Tan, Jin Jie Ryan, Gupta, Shreya, Hao, Yilun, Levine, Gabrael, Gao, Ruohan, Norcia, Anthony, Fei-Fei, Li, Wu, Jiajun

We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans commun

Externí odkaz: http://arxiv.org/abs/2311.01454

Zobrazit plný text záznamu

Report

RealImpact: A Dataset of Impact Sound Fields for Real Objects

Autor: Clarke, Samuel, Gao, Ruohan, Wang, Mason, Rau, Mark, Xu, Julia, Wang, Jui-Hsien, James, Doug L., Wu, Jiajun

Objects make unique sounds under different perturbations, environment conditions, and poses relative to the listener. While prior works have modeled impact sounds and sound propagation in simulation, we lack a standard dataset of impact sound fields

Externí odkaz: http://arxiv.org/abs/2306.09944

Zobrazit plný text záznamu

Report

The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects

Autor: Gao, Ruohan, Dou, Yiming, Li, Hao, Agarwal, Tanmay, Bohg, Jeannette, Li, Yunzhu, Fei-Fei, Li, Wu, Jiajun

We introduce the ObjectFolder Benchmark, a benchmark suite of 10 tasks for multisensory object-centric learning, centered around object recognition, reconstruction, and manipulation with sight, sound, and touch. We also introduce the ObjectFolder Rea

Externí odkaz: http://arxiv.org/abs/2306.00956

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání