Efficient Algorithms for Cleaning and Indexing of Graph data

Autor: Demain Antony DMello, D. K. Santhosh Kumar
Rok vydání: 2020
Předmět:
Zdroj: International Journal of Open Source Software and Processes. 11:1-19
ISSN: 1942-3934
1942-3926
DOI: 10.4018/ijossp.2020070101
Popis: Information extraction and analysis from the enormous graph data is expanding rapidly. From the survey, it is observed that 80% of researchers spend more than 40% of their project time in data cleaning. This signifies a huge need for data cleaning. Due to the characteristics of big data, the storage and retrieval is another major concern and is addressed by data indexing. The existing data cleaning techniques try to clean the graph data based on information like structural attributes and event log sequences. The cleaning of graph data on a single piece of information alone will not increase the performance of computation. Along with node, the label can also be inconsistent, so it is highly desirable to clean both to improve the performance. This paper addresses aforesaid issue by proposing graph data cleaning algorithm to detect the unstructured information along with inconsistent labeling and clean the data by applying rules and verify based on data inconsistency. The authors propose an indexing algorithm based on CSS-tree to build an efficient and scalable graph indexing on top of Hadoop.
Databáze: OpenAIRE