We present the submission of the AIP team to the WMT 2025 Unified MT Evaluation Shared Task, focusing on the span-level error detection subtask.Our system emphasizes responseformat design to better harness the capabilities of OpenAI's o3, the state-of-the-art reasoning LLM.To this end, we introduce Tagged Span Annotation (TSA), an annotation scheme designed to more accurately extract span-level information from the LLM.On our refined version of WMT24 ESA dataset, our referencefree method achieves an F1 score of approximately 27 for character-level label prediction, outperforming the reference-based XCOMET-XXL at approximately 17.