Outlier detection in non-stationary time series applied to sewer network monitoring
Autor: | Ali Shakil, Mohammad Ali Khalighi, Pierre Pudlo, Cyril Leclerc, Dominique Laplace, François Hamon, Alexandre Boudonne |
---|---|
Přispěvatelé: | Institut FRESNEL (FRESNEL), Aix Marseille Université (AMU)-École Centrale de Marseille (ECM)-Centre National de la Recherche Scientifique (CNRS), Institut de Mathématiques de Marseille (I2M), Centre recherche et développement (LyRE), Lyonnaise des Eaux, SUEZ and the French National Association for Research and Technology (ANRT), contract #2020/0274 |
Jazyk: | angličtina |
Rok vydání: | 2023 |
Předmět: |
Smart city
Computer Science Applications [SPI.AUTO]Engineering Sciences [physics]/Automatic Sewer network monitoring Data cleaning [SPI.GCIV]Engineering Sciences [physics]/Civil Engineering Internet-of-Things Artificial Intelligence Hardware and Architecture Management of Technology and Innovation Computer Science (miscellaneous) Outlier detection Engineering (miscellaneous) Software Information Systems |
Zdroj: | Internet of Things Internet of Things, 2023, 21, pp.100654. ⟨10.1016/j.iot.2022.100654⟩ |
ISSN: | 2542-6605 |
DOI: | 10.1016/j.iot.2022.100654⟩ |
Popis: | International audience; We consider the case of data processing for a sewer infrastructure where water drains are equipped with wastelevel sensors, which frequently send the related data to a data processing unit. In order to understand the dynamics of waste accumulation within the whole drain network, the collected data should first be pre-processed by removing the unreliable (or, in other words, noisy) measurements. As we show, the evolution of the waste inside a drain can be modeled by a non-stationary discontinuous time series model. Due to the chaotic aspect of the waste and the hostile conditions under which the sensor should operate, the observed time series can include outliers in the form of peaks, which should be removed from the raw data prior to any data processing. This paper proposes an efficient data cleaning algorithm that makes a good compromise between computational complexity and performance. This latter is evaluated in terms of the probabilities of peak detection (i.e., detecting actual outliers) and false detection (i.e., incorrectly denoting measurements as outliers). A trade-off between these two criteria should be made by setting appropriately the detection threshold (which, in the proposed method, does not depend on the mean or variance of the data). For instance, for a threshold of 2.5, the algorithm provides a correct outlier detection probability of 0.85 and a false detection probability of 2.5×10 −2. The efficiency of the proposed algorithm is demonstrated by applying it to real measurement data. |
Databáze: | OpenAIRE |
Externí odkaz: |