YData Blog

A Machine Learning Approach to Predict Air Quality in California

Written by Fabiana Clemente | August 5, 2020

Predicting air quality is a complex task that has become increasingly relevant in urban areas due to air pollution's critical impact on human health and the environment. In this context, machine learning techniques have proven to be valuable tools for modeling, predicting, and monitoring air quality.


In a recent paper, a popular machine learning method called Support Vector Regression (SVR) was used to forecast pollutant and particulate levels and predict the Air Quality Index (AQI) in California. The authors found that the radial basis function (RBF) kernel allowed SVR to obtain the most accurate predictions.


One of the challenges in predicting air quality is the dynamic nature, volatility, and high variability in time and space of pollutants and particulates. To address this, the authors used the whole set of available variables rather than selecting features using principal component analysis, which proved to be a more successful strategy.


The study results demonstrate that SVR with RBF kernel allows accurate prediction of hourly pollutant concentrations, such as carbon monoxide, sulfur dioxide, nitrogen dioxide, ground-level ozone, and particulate matter 2.5, as well as the hourly AQI. The classification into six AQI categories defined by the US Environmental Protection Agency was performed with an accuracy of 94.1% on unseen validation data.


Overall, the paper highlights the potential of machine learning techniques for predicting air quality, an important area of research given the significant impact of air pollution on human health and the environment. Using SVR with RBF kernel is a promising approach that can contribute to more accurate and efficient air quality monitoring and management in urban areas.