Abstrakt: |
The promised dataset was obtained from daily Turkish words and phrases pronounced by various people in videos posted on YouTube. The purpose of compiling the dataset was to provide a method for the detection of the spoken word by recognizing patterns or classifying lip movements with supervised, unsupervised, and semi-supervised learning, and machine learning algorithms. Most of the datasets related to lip reading consist of people recorded on camera with fixed backgrounds and the same conditions, but the dataset presented here consists of images compatible with machine learning models developed for real-life challenges. It contains a total of 2335 instances taken from TV series, movies, vlogs, and song clips on YouTube. The images in the dataset vary due to factors such as the way people say words, accents, speaking rate, gender, and age. Furthermore, the instances in the dataset consist of videos with different angles, shadows, resolution, and brightness that are not created manually. The most important feature of our lip reading dataset is that we contribute to the non-synthetic Turkish dataset pool, which does not have wide dataset varieties. Machine learning studies can be carried out in many areas, such as education, security, and social life with this dataset. Dataset: https://doi.org/10.17632/4t8vs4dr4v.1. Dataset License: CC BY 4.0 [ABSTRACT FROM AUTHOR] |