Investigating the influence of measurement uncertainty on chlorophyll-a predictions as an indicator of harmful algal blooms in machine learning models

Autor:	I. Busari, D. Sahoo, K.P. Sudheer, R.D. Harmel, C. Privette, M. Schlautman, C. Sawyer
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	HABs Measurement uncertainty Machine learning Sensors High frequency data Information technology T58.5-58.64 Ecology QH540-549.5
Zdroj:	Ecological Informatics, Vol 82, Iss , Pp 102735- (2024)
Druh dokumentu:	article
ISSN:	1574-9541
DOI:	10.1016/j.ecoinf.2024.102735
Popis:	Advancements in data availability, including high frequency, near real-time multiparameter sensors, laboratory analysis, and in-situ and remote observations, have driven the development of machine learning (ML) models for applications such as toxic Harmful Algal Bloom (HABs) monitoring. However, the performance of ML predictions is influenced by both model uncertainties due to inherent model structures and errors associated with input dataset measurements. For example, measurement uncertainty arises from sample collection, sensor drift and laboratory analysis and sample handling errors. While impacts of model uncertainty are commonly addressed using probabilistic approaches, the effect of measurement uncertainty is less studied due to the limited availability of detailed measurement information. This study focuses on assessing the impact of measurement uncertainty on the ML prediction of chlorophyll-a concentration as an index of HABs in a mesotrophic lake. Using randomized subsets of input measured datasets that mimic possible chlorophyll-a concentration distributions, the study built 1000 Random Forest (RF) and Support Vector Regression (SVR) models. An independent measured dataset was used to validate the ensemble models, allowing for model performance evaluation and the creation of prediction intervals to measure the propagated uncertainty. Our findings showed that the model predictions have MAE that ranged between 0.16 μg/l and 5.19 μg/l, and RMSE ranging between 0.20 μg/l and 7.39 μg/l. The highest uncertainty coverage of 0.71 was observed in the RF model without chlorophyll-a sensor values as a predictor. The study found that the training dataset sizes due to the high frequency and manually sampled nature influence how much measurement uncertainty is covered. The results of this study demonstrate how well ML models can capture various HABs patterns when given diverse measurement variables. Our findings will give researchers insightful information on how to lessen the impact of measurement uncertainty when using ML models as decision-support tools for HABs management.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/e920438120db4aa389e4c576c447c6b9 Zobrazit plný text záznamu Full Text from ScienceDirect View record in DOAJ