Exceptional Model Mining
Autor: | Ad Feelders, Wouter Duivesteijn, Arno Knobbe |
---|---|
Rok vydání: | 2015 |
Předmět: |
Structure (mathematical logic)
Computational complexity theory Computer Networks and Communications business.industry Computer science Bayesian network Regression analysis 02 engineering and technology Machine learning computer.software_genre Regression Computer Science Applications Task (project management) Correlation 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Artificial intelligence Data mining business computer Classifier (UML) Information Systems |
Zdroj: | Data Mining and Knowledge Discovery. 30:47-98 |
ISSN: | 1573-756X 1384-5810 |
DOI: | 10.1007/s10618-015-0403-4 |
Popis: | Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (common use of subgroup discovery). These, however, do not encompass all forms of "interesting". To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these targets is chosen to be the target concept. Then, we strive to find subgroups: subsets of the dataset that can be described by a few conditions on single attributes. Such subgroups are deemed interesting when the model over the targets on the subgroup is substantially different from the model on the whole dataset. For instance, we can find subgroups where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We give an algorithmic solution for the EMM framework, and analyze its computational complexity. We also discuss some illustrative applications of EMM instances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand. |
Databáze: | OpenAIRE |
Externí odkaz: |