Exceptional in so Many Ways—Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios
Autor: | Wouter Duivesteijn, Sebastián Ventura, Mykola Pechenizkiy, José María Luna |
---|---|
Přispěvatelé: | Data Mining, EAISI Health, EAISI Foundational |
Rok vydání: | 2020 |
Předmět: |
exceptional patterns
Theoretical computer science General Computer Science Computer science Flag (linear algebra) General Engineering Exceptional model mining Class (philosophy) 02 engineering and technology Extension (predicate logic) supervised descriptive pattern mining Set (abstract data type) Variable (computer science) 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing General Materials Science Pairwise comparison rank correlation Rank correlation |
Zdroj: | IEEE Access, 8:9245545, 200982-200994. Institute of Electrical and Electronics Engineers |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2020.3034885 |
Popis: | The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds subgroups where a pair of target variables display an unusual interaction. What these methods have in common is that one specific exceptionality is enough to flag up a subgroup as exceptional. This, however, naturally leads to the question: can we also find multiple instances of exceptional behaviour simultaneously in the same subgroup? This paper provides a first, affirmative answer to that question in the form of the SPEC (Subsets of Pairwise Exceptional Correlations) model class for EMM. Given a set of predefined numeric target variables, SPEC will flag up subgroups as interesting if multiple target pairs display an unusual rank correlation. This is a fundamental extension of the EMM toolbox, which comes with additional algorithmic challenges. To address these challenges, we provide a series of algorithmic solutions whose strengths/flaws are empirically analysed. |
Databáze: | OpenAIRE |
Externí odkaz: |