Abstrakt: |
We introduce the notion that research administrative databases (RADs), such as those increasingly used to manage information flow in the Institutional Review Board (IRB), offer a novel, useful, and mine-able data source overlooked by informaticists. As a proof of concept, using an IRB database we extracted all titles and abstracts (n=1,876); formatted these in a pseudo-MEDLINE format; and processed them through the SemRep semantic knowledge extraction system. Even though SemRep is tuned to find semantic relations in MEDLINE citations, we found that it performed comparably well on the IRB texts. When adjusted to eliminate non-healthcare IRB texts. When adjusted to eliminate non-healthcare IRB submissions (e.g., economic and education studies), SemRep extracted an average of 7.3 semantic relations per IRB abstract (compared to an average of 11.1 for MEDLINE citations) with a precision of 70% (compared to 78% for MEDLINE). We conclude that RADs, as represented by IRB data, are mine-able with existing tools, but that performance will improve as these tools are tuned for RAD structures. |