Efficient Algorithms for Cleaning and Indexing of Graph data
Autor: | Demain Antony DMello, D. K. Santhosh Kumar |
---|---|
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | International Journal of Open Source Software and Processes. 11:1-19 |
ISSN: | 1942-3934 1942-3926 |
DOI: | 10.4018/ijossp.2020070101 |
Popis: | Information extraction and analysis from the enormous graph data is expanding rapidly. From the survey, it is observed that 80% of researchers spend more than 40% of their project time in data cleaning. This signifies a huge need for data cleaning. Due to the characteristics of big data, the storage and retrieval is another major concern and is addressed by data indexing. The existing data cleaning techniques try to clean the graph data based on information like structural attributes and event log sequences. The cleaning of graph data on a single piece of information alone will not increase the performance of computation. Along with node, the label can also be inconsistent, so it is highly desirable to clean both to improve the performance. This paper addresses aforesaid issue by proposing graph data cleaning algorithm to detect the unstructured information along with inconsistent labeling and clean the data by applying rules and verify based on data inconsistency. The authors propose an indexing algorithm based on CSS-tree to build an efficient and scalable graph indexing on top of Hadoop. |
Databáze: | OpenAIRE |
Externí odkaz: |