Summarizer: Fuzzy Rule-Based Classification Systems for Vertical and Horizontal Big Data
Autor: | Tatiane Nogueira Rios, Pétala Gardênia da Silva Estrela Tuy |
---|---|
Rok vydání: | 2020 |
Předmět: |
Clustering high-dimensional data
Fuzzy rule Horizontal and vertical business.industry Computer science Dimensionality reduction Big data 02 engineering and technology computer.software_genre 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining business computer Classifier (UML) |
Zdroj: | FUZZ-IEEE |
DOI: | 10.1109/fuzz48607.2020.9177683 |
Popis: | The performance of Fuzzy Rule-Based Classification Systems (FRBCSs) is highly affected by the increasing number of instances and attributes present in Big Data. Previously proposed approaches try to adapt FRBCSs to Big Data by distributing data processing with the MapReduce paradigm, by which the data is processed in two stages: Map and Reduce. In the Map stage, the data is divided into multiple blocks and distributed among processing nodes that process each block of data independently. In the Reduce stage, the results coming from every node in the Map stage are aggregated and a final result is returned. This methodology tackles vertical high dimensionality (high number of instances), but it does not approach datasets with simultaneous vertical and horizontal high dimensionality (high number of attributes), as it is the case of text datasets. In this work, we deal with the aforementioned drawbacks by proposing Summarizer, an approach for building reduced feature spaces for horizontally high dimensional data. To this end, we carry out an empirical study that compares a well-known classifier proposed for vertical high dimensionality datasets with and without the horizontal dimensionality reduction process proposed by Summarizer. Our findings show that existing classifiers that tackles vertical Big Data problems can be improved by adding the Summarizer approach to the learning process, which suggests that an unified learning algorithm for datasets with a high number of instances as well as a high number of attributes might be possible. |
Databáze: | OpenAIRE |
Externí odkaz: |