Popis: |
For endoscopists, large-scale screening of gastrointestinal (GI) diseases is arduous and time-consuming. While their workload and human factor-induced errors can be reduced by computer-aided diagnosis (CAD) systems, the existing ones mainly focus on a limited number of lesions or specific organs, making them unsuitable for diagnosing various GI diseases in large-scale disease screening. This paper proposes a transformer and convolutional neural network-based CAD system (called TransMSF) to assist endoscopists in diagnosing multiple GI diseases. This system constructs two feature extraction paths with different coding methods to obtain the lesions’ global and local information. In addition, downsampling is implemented in transformer to get global information of different scales, further enriching the feature representation and reducing the amount of computation and memory occupation. Moreover, a channel and spatial attention module with fewer parameters was successfully designed to pay more attention to the target and reduce the loss of important information during spatial dimension transformation. Finally, the extracted feature information is fused through the feature fusion module and then input into the linear classifier for disease diagnosis. The proposed system outperformed that of other state-of-the-art models on two datasets, reaching a 98.41% precision, a 98.15% recall, a 98.13% accuracy, and a 98.28% F1 score on the in-house GI dataset versus a 95.88% precision, a 95.88% recall, a 98.97% accuracy, and a 95.88% F1 score on the public Kvasir dataset. Moreover, TransMSF’s performance was superior to that of seasoned endoscopists. The above results prove that the proposed system is instrumental in diagnosing GI diseases in large-scale disease screening. It can also be used as a training tool for junior endoscopists to improve their professional skills by rendering helpful suggestions. |