Evaluating sentence representations for biomedical text: Methods and experimental results.

Autor: Tawfik NS; Computer Engineering Department, College of Engineering, Arab Academy for Science, Technology, and Maritime Transport (AAST), 1029 Alexandria, Egypt; Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands. Electronic address: noha.abdelsalam@aast.edu., Spruit MR; Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands. Electronic address: m.r.spruit@uu.nl.
Jazyk: angličtina
Zdroj: Journal of biomedical informatics [J Biomed Inform] 2020 Apr; Vol. 104, pp. 103396. Date of Electronic Publication: 2020 Mar 06.
DOI: 10.1016/j.jbi.2020.103396
Abstrakt: Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub.
Competing Interests: Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2020 Elsevier Inc. All rights reserved.)
Databáze: MEDLINE