Mining and visualising contradictory data

Autor: George Okereke, Chukwuemeka Nwobodo, Honour Chika Nwagwu
Rok vydání: 2017
Předmět:
lcsh:Computer engineering. Computer hardware
Information Systems and Management
Computer Networks and Communications
Flat file database
Bar chart
Computer science
Comma separated values
Contradictions
lcsh:TK7885-7895
ComputerApplications_COMPUTERSINOTHERSYSTEMS
02 engineering and technology
computer.software_genre
lcsh:QA75.5-76.95
law.invention
ConTra
Mutual exclusion values
law
020204 information systems
Server
Contradictory data
0202 electrical engineering
electronic engineering
information engineering

Soundness
lcsh:T58.5-58.64
lcsh:Information technology
Pie chart
020207 software engineering
computer.file_format
Identification (information)
Hardware and Architecture
lcsh:Electronic computers. Computer science
Data mining
Mutual exclusion
computer
Comma-Separated Values
Dataset
Information Systems
Zdroj: Journal of Big Data, Vol 4, Iss 1, Pp 1-11 (2017)
ISSN: 2196-1115
DOI: 10.1186/s40537-017-0100-9
Popis: Big datasets are often stored in flat files and can contain contradictory data. Contradictory data undermines the soundness of the information from a noisy dataset. Traditional tools such as pie chart and bar chart are overwhelmed when used to visually identify contradictory data in multidimensional attribute-values of a big dataset. This work explains the importance of identifying contradictions in a noisy dataset. It also examines how contradictory data in a large and noisy dataset can be mined and visually analysed. The authors developed ‘ConTra’, an open source application which applies mutual exclusion rule in identifying contradictory data, existing in comma separated values (CSV) dataset. ConTra’s capability to enable the identification of contradictory data in different sizes of datasets is examined. The results show that ConTra can process large dataset when hosted in servers with fast processors. It is also shown in this work that ConTra is 100% accurate in identifying contradictory data of objects whose attribute values do not conform to the mutual exclusion rule of a dataset in CSV format. Different approaches through which ConTra can mine and identify contradictory data are also presented.
Databáze: OpenAIRE