Data mining: where do we start?

Autor: R.D. De Veaux
Rok vydání: 2004
Předmět:
Zdroj: IEEE Transactions on Image Processing.
DOI: 10.1109/iti.2003.1225315
Popis: Summary form only given. Data mining is the analysis of (often large) observational data sets to find unsuspected relationships and to summarize the data in novel ways that are both understandable and useful to the data owner (D. Hand (2001). Much exploratory data analysis (EDA) and inferential statistics concern the same problems. Part of the challenge of data mining is the sheer size of the data sets and/or the number of possible predictor variables. With 500 potential predictor variables, just summarizing them and graphing them to start the process is impossible. Instead, in data mining, we may start the process by creating a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it is simply using models as a new exploratory tool. We take a brief tour of the current state of data mining algorithms and using several case studies explain how EDM can be easily used to narrow the search for a useful predictive model and to increase the chances of producing useful meaningful results.
Databáze: OpenAIRE