Výsledky vyhledávání

Accelerating Transducers through Adjacent Token Merging

Autor: Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie

Recent end-to-end automatic speech recognition (ASR) systems often utilize a Transformer-based acoustic encoder that generates embedding at a high frame rate. However, this design is inefficient, particularly for long speech signals due to the quadra

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::dc3855bab09f4bc7b8c3107b081d3b7f
http://arxiv.org/abs/2306.16009

Zobrazit plný text záznamu

Prompting Large Language Models for Zero-Shot Domain Adaptation in Speech Recognition

Autor: Li, Yuang, Wu, Yu, Li, Jinyu, Liu, Shujie

The integration of Language Models (LMs) has proven to be an effective way to address domain shifts in speech recognition. However, these approaches usually require a significant amount of target domain text data for the training of LMs. Different fr

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::38090c0a55d370bec08810bda39f5985
http://arxiv.org/abs/2306.16007

Zobrazit plný text záznamu

Simulating Realistic Speech Overlaps Improves Multi-Talker ASR

Autor: Yang, Muqiao, Kanda, Naoyuki, Wang, Xiaofei, Wu, Jian, Sivasankaran, Sunit, Chen, Zhuo, Li, Jinyu, Yoshioka, Takuya

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers. Due to the difficulty in acquiring real conversation data with high-quality human t

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::e45f3594b8f4b97da5abdc8d894a9b9e
https://doi.org/10.1109/icassp49357.2023.10094928

Zobrazit plný text záznamu

Improving Contextual Spelling Correction by External Acoustics Attention and Semantic Aware Data Augmentation

Autor: Wang, Xiaoqiang, Liu, Yanqing, Li, Jinyu, Zhao, Sheng

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

We previously proposed contextual spelling correction (CSC) to correct the output of end-to-end (E2E) automatic speech recognition (ASR) models with contextual information such as name, place, etc. Although CSC has achieved reasonable improvement in

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::08e72a8e04c73731f969cf87d4b0e4df
https://doi.org/10.1109/icassp49357.2023.10095434

Zobrazit plný text záznamu

Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation

Autor: Wei, Kun, Zhou, Long, Zhang, Ziqiang, Chen, Liping, Liu, Shujie, He, Lei, Li, Jinyu, Wei, Furu

Publikováno v: ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

Direct speech-to-speech translation (S2ST) is an attractive research topic with many advantages compared to cascaded S2ST. However, direct S2ST suffers from the data scarcity problem because the corpora from speech of the source language to speech of

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::02bec1e803cd87d7ac2d53d0b921a952
https://doi.org/10.1109/icassp49357.2023.10095616

Zobrazit plný text záznamu

VioLA: Unified Codec Language Models for Speech Recognition, Synthesis, and Translation

Autor: Wang, Tianrui, Zhou, Long, Zhang, Ziqiang, Wu, Yu, Liu, Shujie, Gaur, Yashesh, Chen, Zhuo, Li, Jinyu, Wei, Furu

Recent research shows a big convergence in model architecture, training objectives, and inference methods across various tasks for different modalities. In this paper, we propose VioLA, a single auto-regressive Transformer decoder-only network that u

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::071e31d93750cf76e2c5f8f576a56722
http://arxiv.org/abs/2305.16107

Zobrazit plný text záznamu

PillarNeXt: Rethinking Network Designs for 3D Object Detection in LiDAR Point Clouds

Autor: Li, Jinyu, Luo, Chenxu, Yang, Xiaodong

In order to deal with the sparse and unstructured raw point clouds, LiDAR based 3D object detection research mostly focuses on designing dedicated local point aggregators for fine-grained geometrical modeling. In this paper, we revisit the local poin

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::b43fd58fad4296e69f9a127c9b00c245
http://arxiv.org/abs/2305.04925

Zobrazit plný text záznamu

Vibration and Noise Characteristics of Air-Core Reactor Used in HVDC Converter Stations

Autor: Shengchang Ji, Gao Lu, Yang Hang, Lingyu Zhu, Li Jinyu, Fan Zhang, Sisi Hui

Publikováno v: IEEE Transactions on Power Delivery. 37:1068-1077

Air-core reactors used in filters are significant equipment in HVDC converter stations. They are among the most serious noise sources under the action of multi-frequency magnetic forces. The vibration and noise characteristics of air-core reactors ar

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_________::c0bd4720f75c9e6e4b85b0d7e469b4ce
https://doi.org/10.1109/tpwrd.2021.3076871

Zobrazit plný text záznamu

Streaming, Fast and Accurate on-Device Inverse Text Normalization for Automatic Speech Recognition

Autor: Gaur, Yashesh, Kibre, Nick, Xue, Jian, Shu, Kangyuan, Wang, Yuhui, Alphanso, Issac, Li, Jinyu, Gong, Yifan

Publikováno v: 2022 IEEE Spoken Language Technology Workshop (SLT).

Automatic Speech Recognition (ASR) systems typically yield output in lexical form. However, humans prefer a written form output. To bridge this gap, ASR systems usually employ Inverse Text Normalization (ITN). In previous works, Weighted Finite State

Externí odkaz: https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5cf2cc7e0b686d94c85e9a40bcff4af2
https://doi.org/10.1109/slt54892.2023.10022543

Zobrazit plný text záznamu

Vyhledávací nástroje:

Upřesnit hledání