Abstract Despite the growing interest in applying reinforcement learning (RL) to design optimization, its high computational cost limits its applicability to problems involving expensive function evaluations. In this study, we propose an efficient RL action strategy specifically designed for acoustic topology optimization. The key idea is to assign action values ( Q -values) to each element individually and select material-filled elements in descending order of their Q -values until the target volume fraction is met, instead of evaluating Q -values for complete combinations of elements that satisfy the volume constraint. This formulation decouples the learning complexity from the combinatorial explosion of candidate layouts, making the training of the Q -value-estimating neural network more efficient and thus the RL-based approach is more suitable for topology optimization problems requiring fine meshes. As a representative application, we consider the design of a muffler’s internal layout to maximize sound transmission loss—a problem where conventional gradient-based methods often fail to achieve near-global optimal solutions. By integrating the proposed method with finite element simulations and a reward function shaped by transmission loss at one or more target frequencies, the RL agent learns policies that directly determine the material distribution for single- or multi-frequency objectives. The resulting muffler designs, based on a two-dimensional finite element model, exhibit near-global optimal performance and outperform those generated by conventional gradient-based methods. The advantages of the proposed approach over standard RL-based topology optimization methods are also clearly demonstrated.