A pipeline for medical literature search and its evaluation

Autor: Imamah Zafar, Aamir Wali, Muhammad Ahmed Kunwar, Noor Afzal, Muhammad Raza
Rok vydání: 2023
Předmět:
Zdroj: Journal of Information Science. :016555152311615
ISSN: 1741-6485
0165-5515
DOI: 10.1177/01655515231161557
Popis: One database commonly used by clinicians for searching the medical literature and practicing evidence-based medicine is PubMed. As the literature grows, it has become challenging for users to find the relevant material quickly because most of the time the relevant results are not on the top. In this article, we propose a search and ranking pipeline to improve the search results based on relevancy. We first propose an ensemble model consisting of three classifiers: bidirectional long-short-term memory conditional random field (bi-LSTM-CRF), support vector machine and naive Bayes to recognise PICO (patient, intervention, comparison, outcome) elements from abstracts. The ensemble was trained on an annotated corpus consisting of 5000 abstracts split into 4000 and 1000 training and testing data, respectively. The ensemble recorded an accuracy of 93%. We then retrieved around 927,000 articles from PubMed for the years 2017–2021 (access date 16 April 2021). For every abstract, we extracted and grouped all P, I and O terms, and stored these groups along with the article ID in a separate database. During the search, every P, I and O term of the query is searched only in its corresponding group of every abstract. The scoring method simply counts the number of matches between the query’s P, I and O elements and the words in P, I and O groups, respectively. The abstracts are sorted by the number of matches and the top five abstracts are listed using their pre-stored abstract IDs. A comprehensive user study was conducted where 60 different queries were formulated and used to generate ranked search results using both PubMed and our proposed model. Five medical professionals assessed the ranked search results and marked every item as relevant or non-relevant. Both models were compared using precision@K and mean-average-precision@K metrics where K is 5. For most of the queries, our model produced higher precision@K values than PubMed. The mean-average-precision@K value of our model is also higher than PubMed (0.83 versus 0.67).
Databáze: OpenAIRE