Manual versus machine: An evaluation of the performance of the Medical Text Indexer (MTI) at classifying different document types by disease area

Autor: Duncan A.Q. Moore, Ohid Yaqub, Bhaven N. Sampat
Rok vydání: 2023
DOI: 10.31235/osf.io/b75fr
Popis: The Medical Subject Headings (MeSH) thesaurus, a controlled vocabulary, is increasingly being used by those who study research and innovation. While classification was once purely entirely manual, human indexers are now assisted by algorithmic suggestions in an effort to automate some of the indexing process. A version of this algorithm, the Medical Text Indexer, has been made available, allowing for classification of arbitrary text into MeSH categories. Potentially, this opens up other document classes to MeSH assignment for research and innovation studies. However, it remains unclear how well the MTI, a tool designed to categorize publications for indexing purposes, can be reliably extended to other document classes. To allow for assessment of the MTI’s performance for different classes of documents, we collected text from grant descriptions, patent claims, and drug indications; and compared the MTI’s categorisation to that of a qualified human classifier. We also tested whether MTI performance varied with text length or score thresholding. Our results suggest that researchers can proceed with confidence that the MTI reliably captures the diseases contained in a text (recall), and that its scoring can be used to guard against false diseases in its outputs (precision).
Databáze: OpenAIRE