Autor: |
Anuya S. Padte, Jimit K. Shah, Purnima Ahirao |
Rok vydání: |
2019 |
Předmět: |
|
Zdroj: |
2019 IEEE Pune Section International Conference (PuneCon). |
Popis: |
Optical Music Recognition (OMR) is a branch of AI analogous to Optical Character Recognition (OCR) in which we train the machine to interpret sheet music to produce a playableor editable form of Music. To solve this problem in an End-to-End manner, Convolutional Recurrent Neural Network (CRNN) architecture is used. It considers both spatial and sequential nature of this problem. CTC loss function is proved to be a favorable choice in these types of sequence problems as it trains the models directly from input images to their corresponding musical transcripts without the need for a frame-by-frame alignment between the image and the ground-truth thereby solving the purpose of End-to-End training. Though traditional CTC seems to solve a major chunk of the problem, it suffers from some limitations due to overfitting/underfitting. It tends overfit/underfit because of uneven frequency distribution of symbols in Datasets and also makes overconfident predictions leading to bad generalization of model. No attempt has been made to overcome the aforementioned limitations collectively. Hence in this paper we propose a method and analyze the solution in the form of SangCTC. SangCTC is an enhanced variation of traditional CTC which attempts to overcome these limitations of overfitting/underfitting simultaneously using the concepts of focal theory and entropy. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|