Transformer-based approach for symptom recognition and multilingual linking.
Autor: | Vassileva S; Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Blvd 'James Bourchier' 5, Sofia 1164, Bulgaria., Grazhdanski G; Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Blvd 'James Bourchier' 5, Sofia 1164, Bulgaria., Koychev I; Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Blvd 'James Bourchier' 5, Sofia 1164, Bulgaria., Boytcheva S; Faculty of Mathematics and Informatics, Sofia University St. Kliment Ohridski, Blvd 'James Bourchier' 5, Sofia 1164, Bulgaria.; Ontotext, Ontotext, ul. 'Nikola Gabrovski' 79, Sofia 1700, Bulgaria. |
---|---|
Jazyk: | angličtina |
Zdroj: | Database : the journal of biological databases and curation [Database (Oxford)] 2024 Sep 10; Vol. 2024. |
DOI: | 10.1093/database/baae090 |
Abstrakt: | This paper presents a transformer-based approach for symptom Named Entity Recognition (NER) in Spanish clinical texts and multilingual entity linking on the SympTEMIST dataset. For Spanish NER, we fine tune a RoBERTa-based token-level classifier with Bidirectional Long Short-Term Memory and conditional random field layers on an augmented train set, achieving an F1 score of 0.73. Entity linking is performed via a hybrid approach with dictionaries, generating candidates from a knowledge base containing Unified Medical Language System aliases using the cross-lingual SapBERT and reranking the top candidates using GPT-3.5. The entity linking approach shows consistent results for multiple languages of 0.73 accuracy on the SympTEMIST multilingual dataset and also achieves an accuracy of 0.6123 on the Spanish entity linking task surpassing the current top score for this subtask. Database URL: https://github.com/svassileva/symptemist-multilingual-linking. (© The Author(s) 2024. Published by Oxford University Press.) |
Databáze: | MEDLINE |
Externí odkaz: |