Highlights
- Benchmark of ML models for coastal pHSWS forecasting.
- Models trained on rare high-frequency data from Eastern Canada.
- XGBoost balances sensitivity and precision at pHSWS < 7.75
- SHAP shows Julian day dominance as composite environmental driver.
- Promising low-cost framework for aquaculture acidification early warning.
Abstract
Ocean acidification poses a growing threat to marine ecosystems and aquaculture productivity, particularly in under-monitored coastal regions such as Eastern Canada. Existing pH prediction frameworks typically rely on multi-year records combining extensive carbonate chemistry, physical, and biological parameters. While these models can achieve high accuracy, their data requirements make them costly, complex, and challenging to implement for local, site-specific acidification forecasting in aquaculture contexts. To address this limitation, this study benchmarks several machine learning models for coastal pHSWS prediction using only three routinely measured environmental variables (temperature, salinity, sea level), from which we derived moving-average descriptors, local gradients, and two temporal indicators, resulting in a compact set of 11 input features. Six different models and a multivariate linear regression baseline were trained on one of the most complete and extended high-frequency datasets available (BSSS2018) and evaluated across four independent datasets: one from the same site but six months earlier (BSSS2017), and three from nearby bays in northeastern New Brunswick collected between 2017 and 2019. Among all tested models, XGBoost emerged as the most reliable and interpretable, achieving the best trade-off between sensitivity and precision at the operational acidification threshold (pHSWS < 7.75). Its performance remained acceptable within-site but declined across bays due to environmental and seasonal discrepancies, underscoring the importance of training data representativeness. SHAP-based explainability confirmed that Julian day was the dominant predictor, integrating the composite effects of seasonal environmental variability. Overall, this study demonstrates that using only low-cost, routinely measured features provides a promising foundation for short-term coastal pH forecasting, particularly for aquaculture monitoring needs. Despite limited inter-bay generalization, the proposed framework shows that interpretable machine learning models can deliver actionable early-warning insights under realistic data constraints. It constitutes one of the first data-driven benchmarks explicitly tested at aquaculture-relevant thresholds, highlighting a scalable and transparent approach toward operational acidification forecasting.
Continue reading ‘Explainable machine learning models for coastal pH forecasting at aquaculture-relevant thresholds in Eastern Canada’





