Semantic preservation of standardized healthcare documents in big data
Autor: | Maqbool Hussain, Jamil Hussain, Jae Hun Bang, Hyonwoo Seung, Shujaat Hussain, Muhammad Afzal, Sungyoung Lee |
---|---|
Rok vydání: | 2018 |
Předmět: |
Big Data
Information retrieval 020205 medical informatics Computer science business.industry Big data Health Informatics Scenario 02 engineering and technology computer.file_format Resolution (logic) Clinical Document Architecture Semantics Constraint (information theory) 03 medical and health sciences 0302 clinical medicine 0202 electrical engineering electronic engineering information engineering Use case 030212 general & internal medicine business Distributed File System computer Delivery of Health Care |
Zdroj: | International journal of medical informatics. 129 |
ISSN: | 1872-8243 |
Popis: | Background Standardized healthcare documents have a high adoption rate in today's hospital setup. This brings several challenges as processing the documents on a large scale takes a toll on the infrastructure. The complexity of these documents compounds the issue of handling them which is why applying big data techniques is necessary. The nature of big data techniques can trigger accuracy/semantic loss in health documents when they are partitioned for processing. This semantic loss is critical with respect to clinical use as well as insurance, or medical education. Methods In this paper we propose a novel technique to avoid any semantic loss that happens during the conventional partitioning of healthcare documents in big data through a constraint model based on the conformance of clinical document standard and user based use cases. We used clinical document architecture (CDA R ) datasets on Hadoop Distributed File System (HDFS) through uniquely configured setup. We identified the affected documents with respect to semantic loss after partitioning and separated them into two sets: conflict free documents and conflicted documents. The resolution for conflicted documents was done based on different resolution strategies that were mapped according to CDA R specification. The first part of the technique is focused in identifying the type of conflict in the blocks that arises after partitioning. The second part focuses on the resolution mapping of the conflicts based on the constraints applied depending on the validation and user scenario. Results We used a publicly available dataset of CDA R documents, identified all conflicted documents and resolved all the them successfully to avoid any semantic loss. In our experiment we tested up to 87,000 CDA R documents and successfully identified the conflicts and resolved the semantic issues. Conclusion We have presented a novel study that focuses on the semantics of big data which did not compromise the performance and resolved the semantic issues risen during the processing of clinical documents. |
Databáze: | OpenAIRE |
Externí odkaz: |