Zobrazeno 1 - 10
of 251 803
pro vyhledávání: '"Recognition system"'
Autor:
Zhao, Fuzheng, Bai, Yu
This study aims to design and implement a laughter recognition system based on multimodal fusion and deep learning, leveraging image and audio processing technologies to achieve accurate laughter recognition and emotion analysis. First, the system lo
Externí odkaz:
http://arxiv.org/abs/2407.21391
Hand gesture recognition (HGR) is a vital component in enhancing the human-computer interaction experience, particularly in multimedia applications, such as virtual reality, gaming, smart home automation systems, etc. Users can control and navigate t
Externí odkaz:
http://arxiv.org/abs/2407.02585
Autor:
Meng, Lingwei, Kang, Jiawen, Wang, Yuejiao, Jin, Zengrui, Wu, Xixin, Liu, Xunying, Meng, Helen
Multi-talker speech recognition and target-talker speech recognition, both involve transcription in multi-talker contexts, remain significant challenges. However, existing methods rarely attempt to simultaneously address both tasks. In this study, we
Externí odkaz:
http://arxiv.org/abs/2407.09817
Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is limited in exploration. Additionally, recent studies call attention to the gap betwee
Externí odkaz:
http://arxiv.org/abs/2407.09807
The Handwritten Text Recognition problem has been a challenge for researchers for the last few decades, especially in the domain of computer vision, a subdomain of pattern recognition. Variability of texts amongst writers, cursiveness, and different
Externí odkaz:
http://arxiv.org/abs/2404.14062
Autor:
Manawadu, Mayura, Wijenayake, Udaya
Traffic signs are important in communicating information to drivers. Thus, comprehension of traffic signs is essential for road safety and ignorance may result in road accidents. Traffic sign detection has been a research spotlight over the past few
Externí odkaz:
http://arxiv.org/abs/2404.07807
Autor:
Kadhim, Ahlam M.1 ahlammjeed@yahoo.com, Jawad, Huda M.1, Kadhum, Farah Jawad1, Al-Zuky, Ali A.1
Publikováno v:
Iraqi Journal of Science. 2024, Vol. 65 Issue 5, p2749-2760. 12p.
While civilized users employ social media to stay informed and discuss daily occurrences, haters perceive these platforms as fertile ground for attacking groups and individuals. The prevailing approach to counter this phenomenon involves detecting su
Externí odkaz:
http://arxiv.org/abs/2405.13011
Autor:
Tian, Jingguang, Ye, Shuaishuai, Chen, Shunfei, Xiang, Yang, Yin, Zhaohui, Hu, Xinhui, Xu, Xinkang
This paper presents our system submission for the In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge, which focuses on speaker diarization and speech recognition in complex multi-speaker scenarios. To address these challenges, we
Externí odkaz:
http://arxiv.org/abs/2405.05498
Word error rate (WER) is a metric used to evaluate the quality of transcriptions produced by Automatic Speech Recognition (ASR) systems. In many applications, it is of interest to estimate WER given a pair of a speech utterance and a transcript. Prev
Externí odkaz:
http://arxiv.org/abs/2404.16743