Zobrazeno 1 - 10
of 13
pro vyhledávání: '"Kaitao Song"'
Publikováno v:
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Autor:
Kaitao Song, Teng Wan, Bixia Wang, Huiqiang Jiang, Luna Qiu, Jiahang Xu, Liping Jiang, Qun Lou, Yuqing Yang, Dongsheng Li, Xudong Wang, Lili Qiu
Hypernasality is an abnormal resonance in human speech production, especially in patients with craniofacial anomalies such as cleft palate. In clinical application, hypernasality estimation is crucial in cleft palate diagnosis, as its results determi
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::9be49ca6b0381b6c1f28d856bf29be36
http://arxiv.org/abs/2208.05122
http://arxiv.org/abs/2208.05122
Autor:
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Publikováno v:
2021 IEEE/CVF International Conference on Computer Vision (ICCV).
Although using convolutional neural networks (CNNs) as backbones achieves great successes in computer vision, this work investigates a simple backbone network useful for many dense prediction tasks without convolutions. Unlike the recently-proposed T
Publikováno v:
KDD
While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for re
Publikováno v:
IEEE Transactions on Image Processing. 29:7006-7018
Traditional fine-grained image recognition is required to distinguish different subordinate categories ( e.g. , birds species) based on the visual cues beneath raw images. Due to both small inter-class variations and large intra-class variations, it
Autor:
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
Publikováno v:
Computational Visual Media, 8 (3)
Transformer recently has presented encouraging progress in computer vision. In this work, we present new baselines by improving the original Pyramid Vision Transformer (PVT v1) by adding three designs, including (1) linear complexity attention layer,
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::8aa958f4e9fdf4434c72574ace232d04
http://arxiv.org/abs/2106.13797
http://arxiv.org/abs/2106.13797
In the development of neural text-to-speech systems, model pre-training with a large amount of non-target speakers' data is a common approach. However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of m
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::77c5ef6e6fd156f27428d387dfb4539e
Publikováno v:
Pattern Recognition and Computer Vision ISBN: 9783030880064
PRCV (2)
PRCV (2)
Defect detection is one of the most challenging tasks in the industry, as defects (e.g., flaw or crack) in objects usually own arbitrary shapes and different sizes. Especially in practical applications, defect detection usually is an unsupervised iss
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_________::3314616cbf786b9a9d19d3b1ce642591
https://doi.org/10.1007/978-3-030-88007-1_29
https://doi.org/10.1007/978-3-030-88007-1_29
Automatic song writing aims to compose a song (lyric and/or melody) by machine, which is an interesting topic in both academia and industry. In automatic song writing, lyric-to-melody generation and melody-to-lyric generation are two important tasks,
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::2d3a6db49e451a4b2cf325ecc7852798
http://arxiv.org/abs/2012.05168
http://arxiv.org/abs/2012.05168
Publikováno v:
IJCAI
Neural machine translation (NMT) generates the next target token given as input the previous ground truth target tokens during training while the previous generated target tokens during inference, which causes discrepancy between training and inferen
Externí odkaz:
https://explore.openaire.eu/search/publication?articleId=doi_dedup___::4afc751b7dbca96cd76d411c65a5b940
http://arxiv.org/abs/2007.10681
http://arxiv.org/abs/2007.10681