Accurate flood forecasting is crucial for effective reservoir operations and flood protection. This study examines the effectiveness of long short-term memory (LSTM) surrogate models in forecasting flood inflows at the Namgang multipurpose dam in South Korea. Unlike prior studies that focused on either purely physical or machine learning/deep learning (ML/DL) methods, this research introduces a novel surrogate modeling approach that integrates physically simulated extreme scenarios generated by applying scale factors directly to observed rainfall data within the Hydrologic Engineering Center-Hydrologic Modeling System (HEC-HMS) model to augment training data for the LSTM model synthetically. The LSTM models were trained using a composite data set that includes observed rainfall, observed inflow at the dam, HEC-HMS-simulated inflow, and scaled rainfall data used in synthetic flood scenarios. This integration of real and augmented data enables the model to generalize effectively across a wide range of hydrological conditions, including extremes not represented in the historical record. To address the complexity of spatial and temporal patterns in flood inflow predictions, this study employs advanced LSTM-family architectures such as the Convolutional LSTM (ConvLSTM). ConvLSTM is particularly suited for capturing spatially and temporally dependent phenomena, which are critical in hydrological modeling. The selection of these models reflects the need to overcome the limitations of general LSTM, which lacks spatial processing capabilities. Hyperparameters were optimized through partial cross-validation to ensure robust model performance. Results showed that the ConvLSTM effectively manages spatial and temporal dependencies, making it particularly suitable for flood forecasting tasks. The model achieved Nash–Sutcliffe efficiency (NSE) values between 0.98 and 0.99, a root-mean square error (RMSE) below 300 Mm3/s, and a peak flow percentage error rate (QER) of less than 1% across all lead times (1–3 h), indicating excellent predictive accuracy. Compared with other models like encoder–decoder lSTM and convolutional neural network (CNN)-LSTM, ConvLSTM consistently produced the most accurate peak flow and timing estimates, even under extreme hydrological conditions. Its superior performance is attributed to its ability to concurrently process spatial and temporal data, capturing complex patterns in the data set. This study underscores the potential of LSTM surrogate modeling in enhancing flood forecasting and advocates for further exploration and refinement of these methods across diverse hydrological contexts.