Softmax plays a critical role in deep learning accelerators, but its hardware implementations often suffer from precision loss due to inaccurate division operations. This paper focuses on the design of a high-precision division unit for Softmax, introducing a three-segment piecewise approximation of that effectively reduces approximation errors. The proposed approach improves numerical accuracy over previous designs, while preserving hardware efficiency and maintaining a multiplication-free and look-up table-free architecture.