The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework

Autor: Philip Trinh, Diane Woodbridge, Rebecca Reilly, Victoria Suarez, Paul Intrevado, Nina Hua
Rok vydání: 2019
Předmět:
Zdroj: SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI
DOI: 10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00341
Popis: This research explores the relationship between daily air quality indicator (AQI) values and the daily intensity of bike-share ridership in New York City. The authors designed and deployed a distributed data science framework on which to process and run Elastic Net, Random Forest Regression, and Gradient Boosted Regression Trees. Nine gigabytes of CitiBike ridership data, along with one gigabyte of air quality indicator (AQI) data were employed. All machine learning algorithms identified bike-share ridership intensity as either the most important or the second most important feature in predicting future daily AQIs. The authors also empirically demonstrated that although a distributed platform was necessary to ingest and pre-process the raw 10 gigabytes of data, the actual execution time of all three machine learning algorithms on cleaned, joined, and aggregated data was far faster on a local, commodity computer than on its distributed counterpart.
Databáze: OpenAIRE