MEMPSEP‐III. A Machine Learning‐Oriented Multivariate Data Set for Forecasting the Occurrence and Properties of Solar Energetic Particle Events Using a Multivariate Ensemble Approach.

Autor: Moreland, Kimberly, Dayeh, Maher A., Bain, Hazel M., Chatterjee, Subhamoy, Muñoz‐Jaramillo, Andrés, Hart, Samuel T.
Předmět:
Zdroj: Space Weather: The International Journal of Research & Applications; Sep2024, Vol. 22 Issue 9, p1-18, 18p
Abstrakt: We introduce a new multivariate data set that utilizes multiple spacecraft collecting in‐situ and remote sensing heliospheric measurements shown to be linked to physical processes responsible for generating solar energetic particles (SEPs). Using the Geostationary Operational Environmental Satellites (GOES) flare event list from Solar Cycle (SC) 23 and part of SC 24 (1998–2013), we identify 252 solar events (>C‐class flares) that produce SEPs and 17,542 events that do not. For each identified event, we acquire the local plasma properties at 1 au, such as energetic proton and electron data, upstream solar wind conditions, and the interplanetary magnetic field vector quantities using various instruments onboard GOES and the Advanced Composition Explorer spacecraft. We also collect remote sensing data from instruments onboard the Solar Dynamic Observatory, Solar and Heliospheric Observatory, and the Wind solar radio instrument WAVES. The data set is designed to allow for variations of the inputs and feature sets for machine learning (ML) in heliophysics and has a specific purpose for forecasting the occurrence of SEP events and their subsequent properties. This paper describes a data set created from multiple publicly available observation sources that is validated, cleaned, and carefully curated for our ML pipeline. The data set has been used to drive the newly‐developed Multivariate Ensemble of Models for Probabilistic Forecast of SEPs (MEMPSEP; see MEMPSEP‐I (Chatterjee et al., 2024, https://doi.org/10.1029/2023SW003568) and MEMPSEP‐II (Dayeh et al., 2024, https://doi.org/10.1029/2023SW003697) for accompanying papers). Plain Language Summary: We present a new data set that uses observations from multiple spacecraft observing the Sun and the interplanetary space around it. This data is connected to the processes that create solar energetic particles (SEPs). SEP events pose threats to both astronauts and assets in space. The data set contains 252 solar flare events that caused SEPs and 17,542 that do not. For each event, we gather information about the local space environment around the sun, such as energetic protons and electrons, the conditions of the solar wind, the magnetic field, and remote solar imaging data. We use instruments from NOAA's Geostationary Operational Environmental Satellites (GOES) and the Advanced Composition Explorer spacecraft, as well as data from the Solar Dynamic Observatory, the Solar and Heliospheric Observatory, and the Wind solar radio instrument WAVES. This data set is designed to be used in machine learning (ML), with a focus on predicting the occurrence and properties of SEP events. We detail each observation obtained from publicly available sources, and the data treatment processes used to validate the reliability and usefulness for ML applications. Key Points: Machine learning oriented data set for predicting the occurance and properties of solar energetic particle eventsMultivariate remote sensing and in‐situ observationsContinuous data set spanning several decades [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index