Application of the deletion/substitution/addition algorithm to selecting land use regression models for interpolating air pollution measurements in California
Autor: | Aaron van Donkelaar, Zev Ross, Richard T. Burnett, Bernardo Beckerman, Michael Jerrett, Randall V. Martin |
---|---|
Rok vydání: | 2013 |
Předmět: |
Generalized linear model
Atmospheric Science Variables Computer science Model selection media_common.quotation_subject Substitution (logic) Air pollution Feature selection medicine.disease_cause Statistics medicine Algorithm General Environmental Science Exposure assessment Interpolation media_common |
Zdroj: | Atmospheric Environment. 77:172-177 |
ISSN: | 1352-2310 |
DOI: | 10.1016/j.atmosenv.2013.04.024 |
Popis: | Land use regression (LUR) models are widely employed in health studies to characterize chronic exposure to air pollution. The LUR is essentially an interpolation technique that employs the pollutant of interest as the dependent variable with proximate land use, traffic, and physical environmental variables used as independent predictors. Two major limitations with this method have not been addressed: (1) variable selection in the model building process, and (2) dealing with unbalanced repeated measures. In this paper, we address these issues with a modeling framework that implements the deletion/substitution/addition (DSA) machine learning algorithm that uses a generalized linear model to average over unbalanced temporal observations. Models were derived for fine particulate matter with aerodynamic diameter of 2.5 microns or less (PM 2.5 ) and nitrogen dioxide (NO 2 ) using monthly observations. We used 4119 observations at 108 sites and 15,301 observations at 138 sites for PM 2.5 and NO 2 , respectively. We derived models with good predictive capacity (cross-validated- R 2 values were 0.65 and 0.71 for PM 2.5 and NO 2 , respectively). By addressing these two shortcomings in current approaches to LUR modeling, we have developed a framework that minimizes arbitrary decisions during the model selection process. We have also demonstrated how to integrate temporally unbalanced data in a theoretically sound manner. These developments could have widespread applicability for future LUR modeling efforts. |
Databáze: | OpenAIRE |
Externí odkaz: |