Abstract:Cyanobacterial blooms have emerged as a global environmental challenge threatening lake ecosystem security and drinking water safety. Timely prediction of bloom outbreaks is critical for implementing preventive measures and reducing disaster risks. To overcome the limitations of conventional mechanism-driven models, including their numerous parameters and computational complexity, this study established an machine learning framework that integrates multi-source monitoring data and remote sensing observations for Lake Chaohu. By integrating multi-site meteorological and water quality measurements with satellite-derived time-series data, we investigated the temporal cumulative effects of meteorological and water quality variables on cyanobacterial blooms. Based on the Random Forest (RF) model, two forecasting models were developed: one considering the temporal cumulative effects of variables (cumulative variable model) and the other using only single-day observations (single-day variable model), to achieve 1–7day (d) forecasts of bloom coverage area. Additionally, SHapley Additive exPlanations (SHAP) analysis was further applied to decode the model"s decision-making mechanisms, revealing feature contributions and nonlinear threshold behaviors. The results showed that: (1) Meteorological variables (air temperature, humidity, precipitation, and air pressure) exhibited longer cumulative effect durations (15~30 days) compared to water quality variables (nitrogen, phosphorus, and dissolved oxygen (1~10 days); (2) Cumulative-variable models demonstrated superior predictive accuracy (R2 = 0.7~0.8) over single-day variable models (R2 = 0.4~0.6), with optimal 1-day ahead performance (R2 = 0.79, RMSE = 35.36 km2); (3) Critical thresholds were identified at average temperature approximately > 23°C, maximum wind speed approximately < 4 m/s, precipitation approximately > 200 mm, nitrogen-phosphorus ratio approximately < 15, pH > 8.5, and dissolved oxygen approximately < 8.9 mg/L. The proposed method enables high-precision short-term forecasting using multi-station monitoring data, holding promise for providing a transferable decision support framework for eutrophic lake management.