引用本文: | 李晓瑛,王华,吴小毛,吴怡,徐浩森.基于机器学习的鄱阳湖溶解氧波动特征及预测.湖泊科学,2025,37(3):915-927. DOI:10.18307/2025.0328 |
| Li Xiaoying,Wang Hua,Wu Xiaomao,Wu Yi,Xu Haosen.Characterization and prediction of dissolved oxygen fluctuation in Lake Poyang based on machine learning. J. Lake Sci.2025,37(3):915-927. DOI:10.18307/2025.0328 |
|
摘要: |
溶解氧(DO)作为反映水体自净能力和水环境质量的关键指标,是评估鄱阳湖水体健康状况的重要参数。随机森林(RF)和改进支持向量回归(PSO-SVR)2种机器学习的高效算法被引入到鄱阳湖DO的预测工作中,时间上选择1988—2023年水质数据进行预测,空间上挑选了位于鄱阳湖和入湖5条河流的共8个关键监测站点:棠荫、信江东支、鄱阳、赣江主支、抚河口、修河口、康山和湖口。对8个监测站点的DO进行曼肯达尔趋势检验,整体上DO浓度上升的站点为抚河口、修河口、康山和湖口,其中康山和湖口的DO浓度在后期表现出显著上升趋势。基于随机森林重要性指数(IMI)探究了DO与其他水质因子间的响应关系,在8个监测站点中水温(T)对DO的重要性指数均较高,其次是高锰酸盐指数(CODMn),各个因子的平均IMI排序为T>CODMn >TN>NH3-N>TP>pH,其重要性指数值分别为2.54、0.81、0.65、0.63、0.43和0.37。使用RF和PSO-SVR模型对1988—2023年月均水质数据进行预测对比分析。整体上,RF和PSO-SVR模型在8个监测站点的总体平均误差分别为0.32和0.54。基于混淆矩阵的模型性能评价中,RF和PSO-SVR模型的平均准确率η分别为0.67和0.52。模型在训练集上整体预测性能为:RF(R=0.953;RMSE=0.397 mg/L)>PSO-SVR(R=0.822;RMSE=0.764 mg/L)。模型在预测集上整体预测性能为:RF(R=0.836;RMSE=0.660 mg/L)>PSO-SVR(R=0.815;RMSE=0.686 mg/L)。两种模型均表现出优秀的预测性能,其中RF的预测能力更好。引入机器学习的高效算法实现对鄱阳湖DO进行精准预测,以期揭示鄱阳湖水质规律以及水质因子之间的内在联系,为环境监测与管理提供科学的决策支持。 |
关键词: 鄱阳湖 溶解氧 预测 随机森林 支持向量回归 混淆矩阵 |
DOI:10.18307/2025.0328 |
分类号: |
基金项目:国家重点研发计划项目(2023YFC320900001);江西省“科技+水利”联合计划项目(2023KSG003)联合资助 |
|
Characterization and prediction of dissolved oxygen fluctuation in Lake Poyang based on machine learning |
Li Xiaoying,Wang Hua,Wu Xiaomao,Wu Yi,Xu Haosen
|
1.College of Environment, Hohai University, Nanjing 210098 , P.R.China ;2.Key Laboratory of Integrated Regulation and Resource Development on Shallow Lake of Ministry of Education, Hohai University, Nanjing 210098 , P.R.China ;3.Jiangxi Province Poyang Lake Water Conservancy Center Construction Office, Nanchang 330009 , P.R.China
|
Abstract: |
Dissolved oxygen (DO) is a key indicator reflecting the self-purification ability of water bodies and the quality of water environment. DO is also an important parameter for assessing the health of water bodies in Lake Poyang. In this study, two efficient machine learning algorithms, random forest (RF) and improved support vector regression (PSO-SVR), were introduced into the monitoring and prediction of DO in Lake Poyang. The water quality data from 1988 to 2023 were selected for prediction in time, and a total of eight key monitoring stations of Lake Poyang and five rivers entering the lake were spatially selected: Tangyin, east branch of Xinjiang River, Poyang, main branch of Ganjiang River, Fuhekou, Xiuhekou, Kangshan and Hukou. Firstly, Mann-Kendall trend test was performed on the DO of the eight monitoring stations. The stations with overall increasing DO were Fuhekou, Xiuhekou, Kangshan and Hukou, among which Kangshan and Hukou showed a significant increasing trend in the later stage. Secondly, the response and relationship between DO and other water quality factors were explored based on the random forest importance index (IMI). The importance index of water temperature (T) to DO was higher in all 8 monitoring stations, followed by month, and the average IMI of each factor ranked T>CODMn>TN> NH3-N>TP>pH, with importance index values of 2.54, 0.81, 0.65, 0.63, 0.43 and 0.37, respectively. The model predictions were then analyzed in comparison to the monthly average water quality data from 1988 to 2023 using RF and PSO-SVR. Overall, the overall mean errors were 0.32 for the RF model and 0.54 for the PSO-SVR model at the eight monitoring stations. The mean accuracies η in the model performance evaluation based on the confusion matrix were 0.67 for RF and 0.52 for PSO-SVR, respectively. The overall prediction performances on the training set were RF (R=0.953; RMSE=0.397 mg/L)>SVR (R=0.822; RMSE=0.764 mg/L). The overall prediction performance of the models on the prediction set was RF (R=0.836; RMSE=0.660 mg/L)>SVR (R=0.815; RMSE=0.686 mg/L). Both models showed excellent predictive performance, with RF having better predictive ability. The R values of the RF model were more concentrated in the training and prediction sets, indicating that the model had better stability and generalization ability. The RMSE values were also more concentrated in the training and prediction sets, but slightly higher in the prediction set. The R and RMSE values of the PSO-SVR model were more dispersed in the training and test sets, indicating that the model's performance varied greatly in different cross-sections, and it may need to be adjusted for different data characteristics. Overall, the RF model showed the best prediction ability on all monitoring sections, with the highest R value and the lowest RMSE value, and showed excellent performance and generalization ability on both training and test sets. The PSO-SVR model also performed well on most monitoring sections, and its prediction performance was slightly inferior to that of the RF model, and it may need to optimize the structure or parameters of the model to improve the prediction accuracy and stability. improve the prediction accuracy and stability. Both models showed excellent predictive performance, with RF having better predictive ability. An efficient algorithm of machine learning was introduced to realize the accurate prediction of dissolved oxygen in Lake Poyang, with a view to revealing the water quality pattern of Lake Poyang and the intrinsic connection between the water quality factors, and providing scientific decision support for environmental monitoring and management. |
Key words: Lake Poyang dissolved oxygen prediction random forest support vector regression confusion matrix |