Autor: |
El Moaqet, Hisham, Janini, Rami, Abdulbaki Alshirbaji, Tamer, Aldeen Jalal, Nour, Möller, Knut |
Předmět: |
|
Zdroj: |
Current Directions in Biomedical Engineering; Dec2024, Vol. 10 Issue 4, p232-235, 4p |
Abstrakt: |
Automated laparoscopic video analysis is essential for assisting surgeons during computer aided medical procedures. Nevertheless, it faces challenges due to complex surgical scenes and limited annotated data. Most of the existing methods for classifying surgical tools in laparoscopic surgeries rely on conventional deep learning methods such as convolutional and recurrent neural networks. This paper explores the use of pure self-attention based models-Vision Transformers for classifying both single-label (SL) and multi-label (ML) frames in Laparoscopic surgeries. The proposed SL and ML models were comprehensively evaluated on the Cholec80 surgical workflow dataset using 5-fold cross validation. Experimental results showed an excellent classification performance with a mean average precision mAP=95.8% that outperforms conventional deep learning multi-label models developed in previous studies. Our results open new avenues for further research on the use of deep transformer models for surgical tool detection in modern operating theaters. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|