Popis: |
PubMed contains nearly 800,000 clinical trial citations, which report detailed trial planning, execution and results, including descriptions of study arms, demographic data, inclusion/exclusion criteria, protocols that have been followed, specific outcomes etc. So far, medical text mining has mostly focused on extracting information from the body of text with some success. Processing of information from tables is often limited to textual captions, whereas data presented in tables are typically ignored in large-scale automated processing. Here we report on a methodology developed to support semi-automated data curation and integration from clinical trial reports that relies on processing both the main text and tables. In a case study with the extraction of values of body mass index and/or weight of patients involved in clinical trials, we achieved a F-measure of 85% for body mass index extraction. |