Zobrazeno 1 - 10
of 15
pro vyhledávání: '"Luo, Yin-Jyun"'
Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical t
Externí odkaz:
http://arxiv.org/abs/2205.05871
Autor:
Wu, Yu-Te, Luo, Yin-Jyun, Chen, Tsung-Ping, Wei, I-Chieh, Hsu, Jui-Yang, Chuang, Yi-Chin, Su, Li
We present and release Omnizart, a new Python library that provides a streamlined solution to automatic music transcription (AMT). Omnizart encompasses modules that construct the life-cycle of deep learning-based AMT, and is designed for ease of use
Externí odkaz:
http://arxiv.org/abs/2106.00497
Recent advances in automatic music transcription (AMT) have achieved highly accurate polyphonic piano transcription results by incorporating onset and offset detection. The existing literature, however, focuses mainly on the leverage of deep and comp
Externí odkaz:
http://arxiv.org/abs/2104.06607
Singing voice correction (SVC) is an appealing application for amateur singers. Commercial products automate SVC by snapping pitch contours to equal-tempered scales, which could lead to deadpan modifications. Together with the neglect of rhythmic err
Externí odkaz:
http://arxiv.org/abs/2010.12196
Most of the state-of-the-art automatic music transcription (AMT) models break down the main transcription task into sub-tasks such as onset prediction and offset prediction and train them with onset and offset labels. These predictions are then conca
Externí odkaz:
http://arxiv.org/abs/2010.09969
Publikováno v:
Published at ICML Workshop on Machine Learning for Media Discovery Workshop (ML4MD) 2020
We present a controllable neural audio synthesizer based on Gaussian Mixture Variational Autoencoders (GM-VAE), which can generate realistic piano performances in the audio domain that closely follows temporal conditions of two essential style featur
Externí odkaz:
http://arxiv.org/abs/2006.09833
Publikováno v:
IJCNN 2020
In this paper, we adapt triplet neural networks (TNNs) to a regression task, music emotion prediction. Since TNNs were initially introduced for classification, and not for regression, we propose a mechanism that allows them to provide meaningful low
Externí odkaz:
http://arxiv.org/abs/2001.09988
We propose a flexible framework that deals with both singer conversion and singers vocal technique conversion. The proposed model is trained on non-parallel corpora, accommodates many-to-many conversion, and leverages recent advances of variational a
Externí odkaz:
http://arxiv.org/abs/1912.02613
In this paper, we learn disentangled representations of timbre and pitch for musical instrument sounds. We adapt a framework based on variational autoencoders with Gaussian mixture latent distributions. Specifically, we use two separate encoders to l
Externí odkaz:
http://arxiv.org/abs/1906.08152
Expressive singing voice correction is an appealing but challenging problem. A robust time-warping algorithm which synchronizes two singing recordings can provide a promising solution. We thereby propose to address the problem by canonical time warpi
Externí odkaz:
http://arxiv.org/abs/1711.08600