Zobrazeno 1 - 10
of 150
pro vyhledávání: '"Varol, Huseyin Atakan"'
This study focuses on the creation of the KazEmoTTS dataset, designed for emotional Kazakh text-to-speech (TTS) applications. KazEmoTTS is a collection of 54,760 audio-text pairs, with a total duration of 74.85 hours, featuring 34.23 hours delivered
Externí odkaz:
http://arxiv.org/abs/2404.01033
We introduce KazParC, a parallel corpus designed for machine translation across Kazakh, English, Russian, and Turkish. The first and largest publicly available corpus of its kind, KazParC contains a collection of 371,902 parallel sentences covering d
Externí odkaz:
http://arxiv.org/abs/2403.19399
This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind. KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and in
Externí odkaz:
http://arxiv.org/abs/2403.19335
Nowadays, it is common for people to take photographs of every beverage, snack, or meal they eat and then post these photographs on social media platforms. Leveraging these social trends, real-time food recognition and reliable classification of thes
Externí odkaz:
http://arxiv.org/abs/2305.07257
Publikováno v:
In International Journal of Disaster Risk Reduction September 2024 111
We present an expanded version of our previously released Kazakh text-to-speech (KazakhTTS) synthesis corpus. In the new KazakhTTS2 corpus, the overall size has increased from 93 hours to 271 hours, the number of speakers has risen from two to five (
Externí odkaz:
http://arxiv.org/abs/2201.05771
We present the development of a dataset for Kazakh named entity recognition. The dataset was built as there is a clear need for publicly available annotated corpora in Kazakh, as well as annotation guidelines containing straightforward--but rigorous-
Externí odkaz:
http://arxiv.org/abs/2111.13419
In this paper, we study an approach to multimodal person verification using audio, visual, and thermal modalities. The combination of audio and visual modalities has already been shown to be effective for robust person verification. From this perspec
Externí odkaz:
http://arxiv.org/abs/2110.12136
We study training a single end-to-end (E2E) automatic speech recognition (ASR) model for three languages used in Kazakhstan: Kazakh, Russian, and English. We first describe the development of multilingual E2E ASR based on Transformer networks and the
Externí odkaz:
http://arxiv.org/abs/2108.01280
Autor:
Musaev, Muhammadjon, Mussakhojayeva, Saida, Khujayorov, Ilyos, Khassanov, Yerbolat, Ochilov, Mannon, Varol, Huseyin Atakan
We present a freely available speech corpus for the Uzbek language and report preliminary automatic speech recognition (ASR) results using both the deep neural network hidden Markov model (DNN-HMM) and end-to-end (E2E) architectures. The Uzbek speech
Externí odkaz:
http://arxiv.org/abs/2107.14419