Atypicity Detection in Data Streams: a Self-Adjusting Approach
Autor: | Alice Marascu, Florent Masseglia |
---|---|
Přispěvatelé: | Usage-centered design, analysis and improvement of information systems (AxIS), Inria Sophia Antipolis - Méditerranée (CRISAM), Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Inria Paris-Rocquencourt, Institut National de Recherche en Informatique et en Automatique (Inria) |
Jazyk: | angličtina |
Rok vydání: | 2011 |
Předmět: |
[INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]
Data stream mining Computer science 02 engineering and technology Self adjusting computer.software_genre Field (computer science) Theoretical Computer Science [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] Set (abstract data type) ComputingMethodologies_PATTERNRECOGNITION [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] Artificial Intelligence 020204 information systems Outlier 0202 electrical engineering electronic engineering information engineering Cluster (physics) 020201 artificial intelligence & image processing Anomaly detection Computer Vision and Pattern Recognition Data mining Cluster analysis computer |
Zdroj: | Intelligent Data Analysis Intelligent Data Analysis, IOS Press, 2011, 15 (1), pp.89-105. ⟨10.3233/IDA-2010-0457⟩ Intelligent Data Analysis, 2011, 15 (1), pp.89-105. ⟨10.3233/IDA-2010-0457⟩ |
ISSN: | 1088-467X |
DOI: | 10.3233/IDA-2010-0457⟩ |
Popis: | International audience; Outlyingness is a subjective concept relying on the isolation level of a (set of) record(s). Clustering-based outlier detection is a field that aims to cluster data and to detect outliers depending on their characteristics (i.e. small, tight and/or dense clusters might be considered as outliers). Existing methods require a parameter standing for the "level of outlyingness", such as the maximum size or a percentage of small clusters, in order to build the set of outliers. Unfortunately, manually setting this parameter in a streaming environment should not be possible, given the fast time response usually needed. In this paper we propose WOD, a method that separates outliers from clusters thanks to a natural and effective principle. The main advantages of WOD are its ability to automatically adjust to any clustering result and to be parameterless. |
Databáze: | OpenAIRE |
Externí odkaz: |