Exceptional in so Many Ways—Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios

Autor: Wouter Duivesteijn, Sebastián Ventura, Mykola Pechenizkiy, José María Luna
Přispěvatelé: Data Mining, EAISI Health, EAISI Foundational
Rok vydání: 2020
Předmět:
Zdroj: IEEE Access, 8:9245545, 200982-200994. Institute of Electrical and Electronics Engineers
ISSN: 2169-3536
DOI: 10.1109/access.2020.3034885
Popis: The current state of the art in supervised descriptive pattern mining is very good in automatically finding subsets of the dataset at hand that are exceptional in some sense. The most common form, subgroup discovery, generally finds subgroups where a single target variable has an unusual distribution. Exceptional model mining (EMM) typically finds subgroups where a pair of target variables display an unusual interaction. What these methods have in common is that one specific exceptionality is enough to flag up a subgroup as exceptional. This, however, naturally leads to the question: can we also find multiple instances of exceptional behaviour simultaneously in the same subgroup? This paper provides a first, affirmative answer to that question in the form of the SPEC (Subsets of Pairwise Exceptional Correlations) model class for EMM. Given a set of predefined numeric target variables, SPEC will flag up subgroups as interesting if multiple target pairs display an unusual rank correlation. This is a fundamental extension of the EMM toolbox, which comes with additional algorithmic challenges. To address these challenges, we provide a series of algorithmic solutions whose strengths/flaws are empirically analysed.
Databáze: OpenAIRE