The Impact of Bike-Sharing Ridership on Air Quality: A Scalable Data Science Framework
Autor: | Philip Trinh, Diane Woodbridge, Rebecca Reilly, Victoria Suarez, Paul Intrevado, Nina Hua |
---|---|
Rok vydání: | 2019 |
Předmět: |
Elastic net regularization
050101 languages & linguistics Distributed database Gigabyte Computer science 05 social sciences 02 engineering and technology Data science Random forest Scalability 0202 electrical engineering electronic engineering information engineering Feature (machine learning) 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences Intelligent transportation system Air quality index |
Zdroj: | SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI |
DOI: | 10.1109/smartworld-uic-atc-scalcom-iop-sci.2019.00341 |
Popis: | This research explores the relationship between daily air quality indicator (AQI) values and the daily intensity of bike-share ridership in New York City. The authors designed and deployed a distributed data science framework on which to process and run Elastic Net, Random Forest Regression, and Gradient Boosted Regression Trees. Nine gigabytes of CitiBike ridership data, along with one gigabyte of air quality indicator (AQI) data were employed. All machine learning algorithms identified bike-share ridership intensity as either the most important or the second most important feature in predicting future daily AQIs. The authors also empirically demonstrated that although a distributed platform was necessary to ingest and pre-process the raw 10 gigabytes of data, the actual execution time of all three machine learning algorithms on cleaned, joined, and aggregated data was far faster on a local, commodity computer than on its distributed counterpart. |
Databáze: | OpenAIRE |
Externí odkaz: |