Zobrazeno 1 - 10
of 315
pro vyhledávání: '"Kleijn, W Bastiaan"'
Within recent approaches to text-to-video (T2V) generation, achieving controllability in the synthesized video is often a challenge. Typically, this issue is addressed by providing low-level per-frame guidance in the form of edge maps, depth maps, or
Externí odkaz:
http://arxiv.org/abs/2401.00896
Recently, various methods have been proposed to address the inconsistency issue of DDIM inversion to enable image editing, such as EDICT [36] and Null-text inversion [22]. However, the above methods introduce considerable computational overhead. In t
Externí odkaz:
http://arxiv.org/abs/2307.10829
A popular approach to sample a diffusion-based generative model is to solve an ordinary differential equation (ODE). In existing samplers, the coefficients of the ODE solvers are pre-determined by the ODE formulation, the reverse discrete timesteps,
Externí odkaz:
http://arxiv.org/abs/2304.11328
We propose lookahead diffusion probabilistic models (LA-DPMs) to exploit the correlation in the outputs of the deep neural networks (DNNs) over subsequent timesteps in diffusion probabilistic models (DPMs) to refine the mean estimation of the conditi
Externí odkaz:
http://arxiv.org/abs/2304.11312
Autor:
Jenrungrot, Teerapat, Chinen, Michael, Kleijn, W. Bastiaan, Skoglund, Jan, Borsos, Zalán, Zeghidour, Neil, Tagliasacchi, Marco
We introduce LMCodec, a causal neural speech codec that provides high quality audio at very low bitrates. The backbone of the system is a causal convolutional codec that encodes audio into a hierarchy of coarse-to-fine tokens using residual vector qu
Externí odkaz:
http://arxiv.org/abs/2303.12984
Text-guided diffusion models such as DALLE-2, Imagen, eDiff-I, and Stable Diffusion are able to generate an effectively endless variety of images given only a short text prompt describing the desired image content. In many cases the images are of ver
Externí odkaz:
http://arxiv.org/abs/2302.13153
Autor:
Yu, Wangyang, Kleijn, W. Bastiaan
We propose an algorithm to estimate source and receiver positions, room geometry and reflection coefficients from a single room impulse response simultaneously. It is based on a symmetry analysis of the room impulse response. The proposed method util
Externí odkaz:
http://arxiv.org/abs/2301.09198
Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new ge
Externí odkaz:
http://arxiv.org/abs/2207.02262
Publikováno v:
7th International Conference on Spoken Language Processing (ICSLP2002), September 16-20, 2002
In this paper, we consider the effect of a bandwidth extension of narrow-band speech signals (0.3-3.4 kHz) to 0.3-8 kHz on speaker verification. Using covariance matrix based verification systems together with detection error trade-off curves, we com
Externí odkaz:
http://arxiv.org/abs/2204.02040
We make contributions towards improving adaptive-optimizer performance. Our improvements are based on suppression of the range of adaptive stepsizes in the AdaBelief optimizer. Firstly, we show that the particular placement of the parameter epsilon w
Externí odkaz:
http://arxiv.org/abs/2203.13273