Abstract:Lake water level is the basis for maintaining the structure, function, and integrity of its ecosystem. The water level change of Lake Poyang is complicated as it was affected by five rivers within the basin and the Yangtze River. To accurately predict the water level change of Lake Poyang, the long short-term memory (LSTM) is used to construct the water level prediction model of Lake Poyang. The model uses the flows of the Ganjiang River, Fuhe River, Xinjiang River, Raohe River, Xiushui River and the mainstream of the Yangtze River as input conditions to predict the water level process of different representative stations in the Lake Poyang area (Hukou, Xingzi, Duchang, Wucheng and Kangshan). The hydrological time series data from 1956 to 1980 is used as the training set, and data from 1981 to 2000 was used as the verification set. The influence of model parameters such as input time window, hidden neuron nodes and initial learning rate on prediction accuracy is discussed. The optimal parameters of the Lake Poyang water level prediction model are determined. The results show that the LSTM can accurately predict the water level at different parts of Lake Poyang based on the water flow from the five rivers and the Yangtze River. The RMSE value of the five stations is 0.41-0.50 m, and the NSE and R2 are 0.96-0.98. In order to investigate the impact of the model training set on the water level prediction results of Lake Poyang, the study further selects data from 5 random years (1956-1960) and 5 typical hydrological years (1954, 1973, 1974, 1977 and 1978) daily average flow data to train the model. The results show that the prediction accuracy of random 5 years data as training set is worse than that of typical annual hydrological data training, especially the prediction of flood and dry water level; since the typical hydrological data volume is still much lower than 20 years of data, the overall prediction accuracy is slightly lower than the model with 20 years of data training. Therefore, representative data should be selected as much as possible for training, when applying such a data-driven LSTM neural network model.