Spline-Based Generalised Additive Models for Interpretable Nonlinear Regression: An Application to Air Quality Data
Maurice Wanyonyi *
Department of Mathematics and Statistics, University of Embu, Kenya and Research and Development, African Institute for Capacity Development (AICAD), 46179–00100, Nairobi, Kenya.
Jacqueline Akelo Gogo
Department of Mathematics and Statistics, University of Embu, Kenya and Research and Development, African Institute for Capacity Development (AICAD), 46179–00100, Nairobi, Kenya.
Jonathan Ndolo Mbithi
Department of Mathematics and Statistics, University of Embu, Kenya.
Edwin Charani Sindiga
Department of Mathematics and Statistics, University of Embu, Kenya.
*Author to whom correspondence should be addressed.
Abstract
Nonlinear relationships frequently arise in environmental data, challenging conventional linear regression models. This study investigates spline-based Generalised Additive Models (GAMs) as an interpretable semiparametric framework for capturing such nonlinearities. Using the UCI Air Quality dataset (9,358 hourly observations), we compare GAMs with linear regression and Random Forest models using blocked crossvalidation and multiple performance metrics. GAMs consistently outperformed linear regression, reducing root mean squared error (RMSE) by 11% for CO and by over 95% for benzene (from 0.796 to 0.034). GAMs achieved predictive accuracy comparable to Random Forest while retaining explicit, interpretable representations of predictor effects. Estimated smooth functions revealed meaningful nonlinear structures, including sensor saturation, nonlinear temperature dependencies, and a significant temperature–humidity interaction. Residual diagnostics confirmed improved model adequacy relative to linear specifications, and robustness analyses supported the stability of the proposed framework. These findings demonstrate that spline-based GAMs offer a statistically coherent and interpretable alternative to both classical linear models and black-box machine learning methods in environmental applications.
Keywords: Generalized Additive Models (GAMs), semiparametric regression, interpretable machine learning, spline smoothing, air quality modelling, nonlinear regression, environmental data analysis