Building Resources For Vietnamese Clinical Text Processing
Autor: | Hiep Nguyen Minh, Huyen Nguyen Thi Minh, Quyen Ngo The |
---|---|
Rok vydání: | 2018 |
Předmět: |
Vocabulary
General Computer Science Computer science Vietnamese media_common.quotation_subject 02 engineering and technology computer.software_genre Text processing 0202 electrical engineering electronic engineering information engineering Narrative media_common 060201 languages & linguistics business.industry Text segmentation 06 humanities and the arts language.human_language Vietnamese grammar 0602 languages and literature language 020201 artificial intelligence & image processing Artificial intelligence Phrase chunking business computer Chunking (computing) Natural language processing |
Zdroj: | Computación y Sistemas. 22 |
ISSN: | 2007-9737 1405-5546 |
Popis: | Clinical texts contain textual data recorded by doctors during medical examinations. Sentences in clinical texts are generally short, narrative, notstrictly adhering to Vietnamese grammar and contain many medical terms which are not present in general dictionaries. In this paper, we investigate the tasks oflexical analysis and phrase chunking for Vietnam eseclinical texts. Although there exist several tools for general Vietnamese text analysis, these tools showeda limited quality in the clinical domain due to the specific grammatical style of clinical texts and the lack of medical vocabulary. Our main contributions are the construction of an annotated corpus (vnEMR) and lexical resources in the medical domain and in consequence theimprovement of the quality of the tools for clinical text analysis, including word segmentation, part-of-speech tagging and chunking. |
Databáze: | OpenAIRE |
Externí odkaz: |