Zobrazeno 1 - 10
of 1 352
pro vyhledávání: '"Richard, Alexander"'
While rendering and animation of photorealistic 3D human body models have matured and reached an impressive quality over the past years, modeling the spatial audio associated with such full body models has been largely ignored so far. In this work, w
Externí odkaz:
http://arxiv.org/abs/2407.13083
Autor:
Richter, Julius, Wu, Yi-Chiao, Krenn, Steven, Welker, Simon, Lay, Bunlong, Watanabe, Shinji, Richard, Alexander, Gerkmann, Timo
We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of differen
Externí odkaz:
http://arxiv.org/abs/2406.06185
Autor:
Chen, Ziyang, Gebru, Israel D., Richardt, Christian, Kumar, Anurag, Laney, William, Owens, Andrew, Richard, Alexander
We present a new dataset called Real Acoustic Fields (RAF) that captures real acoustic room data from multiple modalities. The dataset includes high-quality and densely captured room impulse response data paired with multi-view images, and precise 6D
Externí odkaz:
http://arxiv.org/abs/2403.18821
Although recent mainstream waveform-domain end-to-end (E2E) neural audio codecs achieve impressive coded audio quality with a very low bitrate, the quality gap between the coded and natural audio is still significant. A generative adversarial network
Externí odkaz:
http://arxiv.org/abs/2401.12160
Autor:
Ng, Evonne, Romero, Javier, Bagautdinov, Timur, Bai, Shaojie, Darrell, Trevor, Kanazawa, Angjoo, Richard, Alexander
We present a framework for generating full-bodied photorealistic avatars that gesture according to the conversational dynamics of a dyadic interaction. Given speech audio, we output multiple possibilities of gestural motion for an individual, includi
Externí odkaz:
http://arxiv.org/abs/2401.01885
Autor:
Xu, Xudong, Markovic, Dejan, Sandakly, Jacob, Keebler, Todd, Krenn, Steven, Richard, Alexander
While 3D human body modeling has received much attention in computer vision, modeling the acoustic equivalent, i.e. modeling 3D spatial audio produced by body motion and speech, has fallen short in the community. To close this gap, we present a model
Externí odkaz:
http://arxiv.org/abs/2311.06285
A good audio codec for live applications such as telecommunication is characterized by three key properties: (1) compression, i.e.\ the bitrate that is required to transmit the signal should be as low as possible; (2) latency, i.e.\ encoding and deco
Externí odkaz:
http://arxiv.org/abs/2305.16608
Autor:
Chen, Changan, Richard, Alexander, Shapovalov, Roman, Ithapu, Vamsi Krishna, Neverova, Natalia, Grauman, Kristen, Vedaldi, Andrea
We introduce the novel-view acoustic synthesis (NVAS) task: given the sight and sound observed at a source viewpoint, can we synthesize the sound of that scene from an unseen target viewpoint? We propose a neural rendering approach: Visually-Guided A
Externí odkaz:
http://arxiv.org/abs/2301.08730