A document image classification system fusing deep and machine learning models.

Autor: Omurca, Sevinç İlhan, Ekinci, Ekin, Sevim, Semih, Edinç, Eren Berk, Eken, Süleyman, Sayar, Ahmet
Předmět:
Zdroj: Applied Intelligence; Jun2023, Vol. 53 Issue 12, p15295-15310, 16p
Abstrakt: Artificial Intelligence (AI) technologies are now widely employed to overcome human-induced faults in a variety of systems used in our daily lives, thanks to the digital transformation.One example of such systems is online document tracking systems (DTS). The DTS's reliability and preferability are enhanced by automatic document classification and understanding features. Although automatic document classification systems can assist humans in document understanding tasks, most of of them are not designed to function with Portable Document Format (PDF), which contains text, tables or figures. In this study, we investigate separate ways to efficiently classify student documents that are uploaded in PDF format and are required for university education. We propose three possible techniques for this issue. The first approach is based on Optical Character Recognition (OCR) and traditional machine learning methods. The second is purely on deep learning. The third one is based on fusion of deep learning methods based on entropy. The proposed techniques can classify twelve distinct types of digital documents. The validity of the proposed methods has been verified by student affairs department of Kocaeli University in Turkey. The system has not only increased the efficiency of online document uploading steps for students, but also reduced the human cost for tracking the documents. The highest F-score (94.45%) is obtained by the ensemble of EfficientNetB3 and ExtraTree. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index