Semantic preservation of standardized healthcare documents in big data

Autor: Maqbool Hussain, Jamil Hussain, Jae Hun Bang, Hyonwoo Seung, Shujaat Hussain, Muhammad Afzal, Sungyoung Lee
Rok vydání: 2018
Předmět:
Zdroj: International journal of medical informatics. 129
ISSN: 1872-8243
Popis: Background Standardized healthcare documents have a high adoption rate in today's hospital setup. This brings several challenges as processing the documents on a large scale takes a toll on the infrastructure. The complexity of these documents compounds the issue of handling them which is why applying big data techniques is necessary. The nature of big data techniques can trigger accuracy/semantic loss in health documents when they are partitioned for processing. This semantic loss is critical with respect to clinical use as well as insurance, or medical education. Methods In this paper we propose a novel technique to avoid any semantic loss that happens during the conventional partitioning of healthcare documents in big data through a constraint model based on the conformance of clinical document standard and user based use cases. We used clinical document architecture (CDA R ) datasets on Hadoop Distributed File System (HDFS) through uniquely configured setup. We identified the affected documents with respect to semantic loss after partitioning and separated them into two sets: conflict free documents and conflicted documents. The resolution for conflicted documents was done based on different resolution strategies that were mapped according to CDA R specification. The first part of the technique is focused in identifying the type of conflict in the blocks that arises after partitioning. The second part focuses on the resolution mapping of the conflicts based on the constraints applied depending on the validation and user scenario. Results We used a publicly available dataset of CDA R documents, identified all conflicted documents and resolved all the them successfully to avoid any semantic loss. In our experiment we tested up to 87,000 CDA R documents and successfully identified the conflicts and resolved the semantic issues. Conclusion We have presented a novel study that focuses on the semantics of big data which did not compromise the performance and resolved the semantic issues risen during the processing of clinical documents.
Databáze: OpenAIRE