Classifying Long Clinical Documents with Pre-trained Transformers

Autor:	Su, Xin, Miller, Timothy, Ding, Xiyu, Afshar, Majid, Dligach, Dmitriy
Rok vydání:	2021
Předmět:	FOS: Computer and information sciences Computer Science - Computation and Language education behavioral disciplines and activities Computation and Language (cs.CL) psychological phenomena and processes
DOI:	10.48550/arxiv.2105.06752
Popis:	Automatic phenotyping is a task of identifying cohorts of patients that match a predefined set of criteria. Phenotyping typically involves classifying long clinical documents that contain thousands of tokens. At the same time, recent state-of-art transformer-based pre-trained language models limit the input to a few hundred tokens (e.g. 512 tokens for BERT). We evaluate several strategies for incorporating pre-trained sentence encoders into document-level representations of clinical text, and find that hierarchical transformers without pre-training are competitive with task pre-trained models.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::18afc1fa21cebd18d2722fe9f8174c16 Zobrazit plný text záznamu