Abstrakt: |
Accurately segmenting organs, or diseased regions of varying sizes, is a challenge in medical image segmentation tasks with limited datasets. Although Transformer-based methods have self-attention mechanisms, excellent global modelling abilities and can effectively focus on crucial areas in medical images, they still face the intractable issues of computational complexity and excessive reliance on data. The Swin-Transformer has partly addressed the problem of computational complexity, however the method still requires large amounts of training data due to its lack of inductive bias capabilities. When applied to medical image segmentation tasks with small datasets, this leads to suboptimal performances. In contrast, the CNN-based methods can compensate for this limitation. To address this issue further, this paper proposes Swin-IBNet, which combines the Swin-Transformer with a CNN in a novel manner to imbue it with inductive bias capabilities, reducing its reliance on data. During the encoding process of Swin-IBNet, two novel and crucial modules, the feature fusion block (FFB) and the multiscale feature aggregation block (MSFA), are designed. The FFB is responsible for propagating the inductive bias capability to the Swin-Transformer encoder. Different from the previous use of multiscale features, MSFA efficiently leverages multiscale information from different layers through self-learning. This paper not only attempts to analyse the interpretability of the proposed Swin-IBNet but also performs more verifications on the public Synapse, ISIC 2018 and ACDC datasets. The experimental results show that Swin-IBNet is superior to the baseline method, Swin-Unet, and several state-of-the-art methods. Especially on the Synapse dataset, the DSC of Swin-IBNet surpasses that of Swin-Unet by 3.45%. [ABSTRACT FROM AUTHOR] |