RPCA-based techniques for pattern extraction, hotspot identification and signal correction using data from a dense network of low-cost NO 2 sensors in London.

Autor: Bogaert M; Department of Civil and Environmental Engineering, Imperial College London, United Kingdom. Electronic address: martin.bogaert19@imperial.ac.uk., Mouritzen C; Department of Chemistry, University of Copenhagen, Denmark., Johnson MS; Department of Chemistry, University of Copenhagen, Denmark; AirScape, 88 Baker St, London W1U 6TQ, United Kingdom., van Reeuwijk M; Department of Civil and Environmental Engineering, Imperial College London, United Kingdom.
Jazyk: angličtina
Zdroj: The Science of the total environment [Sci Total Environ] 2024 May 15; Vol. 925, pp. 171522. Date of Electronic Publication: 2024 Mar 15.
DOI: 10.1016/j.scitotenv.2024.171522
Abstrakt: High-density low-cost air quality sensor networks are a promising technology to monitor air quality at high temporal and spatial resolution. However the collected data is high-dimensional and it is not always clear how to best leverage this information, particularly given the lower data quality coming from the sensors. Here we report on the use of robust Principal Component Analysis (RPCA) using nitrogen dioxide data obtained from a recently deployed dense network of 225 air pollution monitoring nodes based on low-cost sensors in the Borough of Camden in London. RPCA addresses the brittleness of singular value decomposition towards outliers by using a decomposition of the data into low-rank and sparse contributions, with the latter containing outliers. The modal decomposition enabled by RPCA identifies major periodic patterns including spatial and temporal bias, dominant spatial variance, and north-south bias. The five most descriptive components capture 98 % of the data's variance, achieving a compression by a factor of 1500. We present a new technique that uses the sparse part of the data to identify hotspots. The data indicates that at the locations of the top 15 % most susceptible nodes in the network, the model identifies 23 % more hotspots than in all other locations combined. Moreover, the median hotspot event at these at-risk locations exceeds the mean NO 2 concentration by 33μg/m 3 . We show the potential of RPCA for signal correction; it corrects random errors yielding a reference signal with R 2 >0.8. Moreover, RPCA successfully reconstructs missing data from a sensor with R 2 =0.72 from the rest of the sensor network, an improvement upon PCA of around 50 %, allowing air quality estimations even if a sensor is out of use temporarily.
Competing Interests: Declaration of competing interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Matthew S. Johnson reports a relationship with Airscape that includes: board membership, consulting or advisory, employment, and equity or stocks. If there are other authors, they declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
Databáze: MEDLINE