Groundbreaking advances in materials and chemical research have been driven by the development of atomistic simulations. However, the broader applicability of atomistic simulations remains limited, as they inherently depend on energy models that are either approximate or computationally prohibitive for large-scale simulations. Machine learning interatomic potentials (MLIPs) have recently emerged as a promising class of energy models, but their deployment also remains challenging due to the scarcity of systematic protocols for generating training data spanning diverse structural regimes. Here we introduce GAIA, an end-to-end automated framework that streamlines dataset construction for the development of general-purpose reactive MLIPs. GAIA combines a metadynamics-based exploration scheme with closed-loop data expansion for the efficient sampling of a broad spectrum of atomic arrangements, thereby addressing the reliance on heuristics in conventional dataset generation. Using GAIA, we constructed Titan25, a benchmark-scale dataset, and trained an MLIP that closely matches both static and dynamic density functional theory results. The resulting model reproduces key experimental observations across distinct modes of reactivity, including detonation, coalescence, and catalytic processes. GAIA thus helps bridge the gap between simulation and experiment, paving the way toward scalable and general MLIPs capable of describing a wide range of materials and chemical processes.