Bone Age Assessment Using Content-Based Image Retrieval System Using VGG-19 Deep Neural Network

Autor: Behnam Dorostkar yaghouti, Kambiz Rahbar, Fatemeh Taheri
Jazyk: English<br />Persian
Rok vydání: 2023
Předmět:
Zdroj: طب انتظامی, Vol 12, Iss 1 (2023)
Druh dokumentu: article
ISSN: 2228-6241
2383-3483
DOI: 10.30505/12.1.15
Popis: INTRODUCTION With the development of medical imaging devices, medical image production has increased significantly in recent years. Efficient management and retrieval of medical image datasets improve the prevention process and society's health. It is challenging to provide an accurate diagnosis while maintaining efficiency. Based on past studies, images that show a similar pathological condition help physicians and radiologists diagnose, record radiological reports, and plan treatment [1]. Content-based image retrieval is a process in which similar images are identified and retrieved from an extensive database of images using search image content representation. Therefore, medical image retrieval systems have been considered in various fields of education, diagnosis, and care [2] and many other fields. Accurate and fast retrieval of images from image databases is the main challenge of this field. This challenge is more important in recovering medical images due to the sensitivities associated with diagnosing and identifying abnormalities. MATERIALS & METHODS This observational study was conducted in 2023. The tested population was "Digital Hand ATLAS" [27], with 1389 samples of hand images of people aged 1 to 18 years. This collection of images is categorized into four races: Asian, Black, Hispanic, and Caucasian. In addition to race, gender characteristics are also known. The actual age of each sample is also known in advance. Five samples were taken for each age group under ten; ten were taken over ten years old, 440 samples for people under ten years old, and 949 samples for people 10 to 18 years old. Figure 2 shows the flowchart of the proposed image retrieval method in bone age assessment. First, each image I in the image dataset S= I i i=1 n is fed to the pre-trained network VGG-19 [28]. The feature vector F I i = { f 1 , f 2 ,…, f n } of each image is extracted from FC7 fully connected layer. The sets of feature vectors of dataset images form the initial feature space. All features extracted from images do not have the same role and importance in data separation. By reducing the dimensions of the feature vectors, in addition to maintaining the data structure and maintaining important features, the speed of comparison and search can also be improved. For this purpose, the principal component analysis algorithm is used. This algorithm is one of the feature mapping methods. In the feature mapping method, the nature of the features is changed [29]. This algorithm can preserve the overall structure of the data and represent the data in feature space with lower dimensions. With the help of the dimension reduction method, unnecessary features that often cause inappropriate performance in pattern recognition and retrieval are also removed. After this step, a new feature vector F I i ' = { f 1 , f 2 ,…, f n }. For each image I is formed. In this way, a database of new feature vectors is formed for the database images. Similarly, the feature vector of the search image is also extracted F I q = { f 1 , f 2 ,…, f n }. In the similarity measurement stage, the feature vector of the search image and the feature vectors of the database are compared by calculating the Euclidean distance. Then images with the highest degree of similarity to the search image are retrieved. After retrieving similar images, the descriptions related to each image are decoded, and the bone age of each retrieved image is determined. Finally, the retrieved samples are calculated to estimate the bone age for the search image by averaging the bone age. The VGG-19 deep neural network, whose architecture is shown in Figure 3, comprises only convolution layers and an integration layer with a stack-like structure. This network consists of 16 convolution layers and three fully connected layers. First, two convolutional layers with 64 filters with the size of 3*3 filters and then there is a 2×2 maxpooling layer with a straide of 2. This layer is effective in reducing the number of learnable network parameters by reducing the size of feature map. Next, two more convolutional layers with 128 filters with the size of 3*3 filters and a 2*2 max pooling layer and step 2 are placed. Similarly, three convolutional layers with 256 filters with the size of 3*3 filters and one 2*2 max pooling layer with step 2 are included. Two sets, including three convolutional layers with 512 filters with the size of 3*3 filters and a max pooling layer, form the continuation of this network. Finally, the features enter the Fully Connected neural layer as a feature vector with dimensions of 4096. A neural layer with dimensions corresponding to the number of classes forms the last layer of this network. In the proposed method, only the feature vector of the full connection layer has been used as the feature vector. The activator function in all convolutional layers and neural layers is the ReLU function (Rectified Linear Unit). This function returns zero as output for negative data and exactly the data value for positive data. ReLU’s activator function is of interest in deep networks due to its simple mathematical calculations and high modeling speed. The following formula shows the rule of the ReLU function. A(x)=max (0, x) Different image details are identified by applying different filtering in the convolution layers of the VGG-19 neural network. The representation resulting from the visualization of applying filtering for an input image is shown in Figure 4. The displayed images differ in brightness, image edges, and texture pattern recognition. This representation shows that the feature vector obtained from the VGG-19 neural network includes image content features that are hierarchically extracted from the image at different levels. Weighted mean absolute error is an accepted criterion in the quantitative evaluation of bone age assessment results. Let ( x 1 , x 2 ,…, x N ) be data with dimensions of k features and ( y 1 , y 2 ,…, y N ) represent the actual values of bone age for N samples. The predicted bone age f (xi) is compared with the actual value yi . How to calculate this criterion is presented in the equation below. In this equation, wi is the similarity weight of each retrieval sample. In this method, the age difference of the evaluation sample (reference sample) of the bone age of the five best-retrieved samples is calculated as the weighted mean of the absolute error. The lower this amount is and the closer to zero, it indicates that the retrieved samples are close and similar to the searched sample. wMAE= 1 w i 1 N w i y i -f( x i ) Ethical Permissions: The code of ethics IR has reviewed and approved this research.IAU.SRB.REC.1402.139 in the Islamic Azad University, Science and Research Branch. The ethical principles of the present study were fully respected; maintaining confidentiality and not knowing the identity of people in this dataset is also observed. Statistical Analysis: The implementation of the proposed method of image retrieval in bone age assessment and analysis of the results was done in MATLAB 2022a software. FINDINGS The population evaluated in this study included 1389 hand X-ray image samples. Sample images of this dataset are shown in Figure 5. The average number of samples for each age category was 77 samples. These samples were selected from males and females and all four Asian, Black, Hispanic, and Caucasian races. For each test run, the top five retrieved samples and the age of the retrieved samples were compared with the reference sample. For each image, a feature vector equal to 4096 features was extracted. In the next step, by applying the dimension reduction algorithm of principal components analysis, 260 features were considered for each image. The criteria for measuring the similarity of two images, calculating the Euclidean distance between the search image's feature vector and the database images' feature vectors, were considered. The bone age evaluation criterion was the absolute mean of error. The evaluation of the results of the proposed approach was presented in two parts. Quantitative results were examined and compared with other previous methods in the first part. The second part examined the qualitative evaluation of the retrieved images and their relationship with the searched image. The image retrieval quality in the proposed method was demonstrated by evaluating the retrieval samples for several search image samples in this section. This evaluation was done to check the quality and correlation of the retrieved samples in response to the search image. The average bone age of the top five retrieved samples was used to compare the bone age of the search image. The comparison of retrieval results for three search image samples from four different races is shown in Table 2. The first example concerns image 5020 from the hand digital image ATLAS dataset. This image belonged to an 18-year-old person in a group of Asian people. Since the retrieved specimens also belong to the same group, the bone age of the search image was also confirmed to be 18 years. The second evaluation example in Table 2 concerns image number 3245, belonging to a 12-year-old black person from the digital hand image ATLAS dataset. Among the retrieved samples, two were related to the group of 12-year-olds, and the other three were related to the group of 13-year-old. The bone age of the search image was estimated to be 12.57 years. This way, the search image was calculated with a 0.6 bone age difference for this sample. The third sample in Table 2 was image 5103, belonging to a 16-year-old Hispanic person. Among the retrieved samples, three were related to 16-year-old people, one was related to a 17-year-old person, and one was related to an 18-year-old person. This way, the bone age of the search image in this example was estimated to be 16.76 years. DISCUSSION The present study investigated the reliability of an automatic bone age assessment method through the image retrieval system. This study showed an error of less than four months in bone age assessment. The proposed method is comparable with the findings of studies in line with this research, including references [25-20]. The new Bonet network has been introduced in research by combining three neural networks and the transfer learning method to transfer the training domain [20]. This paper's mean absolute error in bone age estimation is reported to be 0.79. In another study, from the pre-trained neural network and by adding more information sources such as gender, the retraining process of the network was performed [21]. The mean error in this method is reported to be 62%. In reference [22], the bone age assessment model through the region-based convolutional neural network (R-CNN) is proposed. This diagnostic method focuses on bone age regression to identify the ossification centers of the epiphysis and carpal bones. In this method, large-scale X-ray images are considered the neural network input. The average absolute error in this method is reported to be 0.51. In the method presented by Cardoso et al., the MobileNet network is used to extract image features to evaluate bone age [24]. The estimation error in this method is reported to be 1.4 years. This method pays attention to the hand's position in the image. The extracted features are limited to certain areas of the image. In this way, the extracted features are considered locally and not globally. This method's bone age assessment error is reported with a mean absolute error of 0.62. Many efforts have been made to increase the accuracy of bone age detection and estimation with the help of deep neural networks. Complex architecture, training time, and providing the number of training samples in the retraining process are among the problems of the mentioned methods. In the proposed method, in addition, to feature extraction, attention was paid to reducing the dimensions of the feature vector in order to reduce the time of comparison with the samples of the dataset. Reducing the dimensions of the feature vector by targeting ineffective features and reducing the comparison time was investigated. The best results showed the error rate with the weighted mean absolute error equal to 0.29 years and 3.4 months. Despite the appropriate performance of the proposed method in assessing and estimating bone age, they are increasing the accuracy of retrieving similar samples by combining local and national characteristics without creating redundancy in the approach of future studies of this research. Despite the performance of methods based on smart algorithms, it is important to note that bone age assessment should be combined with other research techniques. Also, qualified medical professionals should use this tool to increase accuracy and reliability. This study had limitations, such as more access to internal samples for more detailed evaluations. Although the bone density and growth pattern of the Iranian race is included in the category of Asian samples and has been investigated in the dataset evaluated by "Digital Hand ATLAS, "the need to investigate and localize methods based on smart algorithms for the implementation of smart systems in the field of medicine requires the provision of local data. CONCLUSION Based on this research, evaluating bone age with the help of image retrieval is an effective method for estimating bone age. Therefore, experts in this field can use this method to verify and detect the age of people without identity documents and other related matters. Clinical & Practical Tips in POLICE MEDICINE: One of the applications of bone age assessment in police investigations is to determine the age of unknown persons. In cases where a person's age cannot be determined by other means, such as identification or witness testimony, bone age assessment can estimate a person's age based on skeletal maturity. Another use of bone age assessment is in cases of suspected child abuse or neglect. In some cases, it can be challenging to determine the age of a child who has been abused or neglected, especially if the child has been denied proper nutrition or medical care. Using bone age assessment, researchers can estimate a child's age and determine whether the care provided has resulted in normal growth. In addition to the mentioned cases, bone age assessment is also used in cases of human trafficking or illegal immigration where the person's age is unclear or disputed. Using bone age assessment, the maturity or immaturity of a person can be recognized. Also, this information can be used to determine the appropriate legal and social services for the person or the limits of the crime according to the person's age. Conflict of interest: The authors of the article stated that there is no conflict of interest in the present study. Authors' Contribution: First author, presenting the idea; second author, presentation, data analysis; third author, data analysis; All the authors participated in the final writing of the article and its revision, and all of them accept the responsibility for the accuracy and correctness of the contents of the present article by finalizing the present article. Financial Sources: The current research lacked financial support from government and private authorities.
Databáze: Directory of Open Access Journals