Co-ordinate-based positional embedding that captures resolution to enhance transformer's performance in medical image analysis.

Autor: Das BK; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany. badhankumar.das@siemens-healthineers.com.; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany. badhankumar.das@siemens-healthineers.com., Zhao G; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA., Islam S; Digital Technology and Innovation, Siemens Healthineers, Erlangen, Germany.; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany., Re TJ; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA., Comaniciu D; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA., Gibson E; Digital Technology and Innovation, Siemens Healthineers, Princeton, NJ, USA., Maier A; Pattern Recognition Lab, Department of Computer Science, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany.
Jazyk: angličtina
Zdroj: Scientific reports [Sci Rep] 2024 Apr 23; Vol. 14 (1), pp. 9380. Date of Electronic Publication: 2024 Apr 23.
DOI: 10.1038/s41598-024-59813-x
Abstrakt: Vision transformers (ViTs) have revolutionized computer vision by employing self-attention instead of convolutional neural networks and demonstrated success due to their ability to capture global dependencies and remove spatial biases of locality. In medical imaging, where input data may differ in size and resolution, existing architectures require resampling or resizing during pre-processing, leading to potential spatial resolution loss and information degradation. This study proposes a co-ordinate-based embedding that encodes the geometry of medical images, capturing physical co-ordinate and resolution information without the need for resampling or resizing. The effectiveness of the proposed embedding is demonstrated through experiments with UNETR and SwinUNETR models for infarct segmentation on MRI dataset with AxTrace and AxADC contrasts. The dataset consists of 1142 training, 133 validation and 143 test subjects. Both models with the addition of co-ordinate based positional embedding achieved substantial improvements in mean Dice score by 6.5% and 7.6%. The proposed embedding showcased a statistically significant advantage p-value< 0.0001 over alternative approaches. In conclusion, the proposed co-ordinate-based pixel-wise positional embedding method offers a promising solution for Transformer-based models in medical image analysis. It effectively leverages physical co-ordinate information to enhance performance without compromising spatial resolution and provides a foundation for future advancements in positional embedding techniques for medical applications.
(© 2024. The Author(s).)
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje