Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts
Autor: | Koichi Takeda, Masaaki Nagata, Ryohei Sasano, Kosuke Yamada, Tsutomu Hirao |
---|---|
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Conditional random field Markov chain Computer science business.industry Context (language use) computer.software_genre Sequence labeling Task (project management) 03 medical and health sciences 030104 developmental biology 0302 clinical medicine Rhetorical question 030212 general & internal medicine Artificial intelligence CRFS business computer Sentence Natural language processing |
Zdroj: | EMNLP (Findings) |
Popis: | Dividing biomedical abstracts into several segments with rhetorical roles is essential for supporting researchers’ information access in the biomedical domain. Conventional methods have regarded the task as a sequence labeling task based on sequential sentence classification, i.e., they assign a rhetorical label to each sentence by considering the context in the abstract. However, these methods have a critical problem: they are prone to mislabel longer continuous sentences with the same rhetorical label. To tackle the problem, we propose sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences. Accordingly, we introduce Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths. Experimental results obtained from PubMed 20k RCT and NICTA-PIBOSO datasets demonstrate that our proposed method achieved the best micro sentence-F1 score as well as the best micro span-F1 score. |
Databáze: | OpenAIRE |
Externí odkaz: |