As the field of robotics advances, Embodied Instruction Following (EIF) has emerged as a key challenge in artificial intelligence. EIF tasks require agents to interpret and execute natural language instructions by predicting and completing sequences of subgoals within a physical environment. Traditional evaluation metrics—such as Success Rate (SR), Goal Condition (GC), Path Length Weighted SR (PLWSR), and Path Length Weighted GC (PLWGC)—primarily focus on task success following low-level control actions. However, these metrics inadequately assess the accuracy of high-level planning, which is critical for overall task performance. Existing methods for evaluating high-level planning often rely on comparing predicted plans to a single human-annotated ground truth trajectory, implicitly assuming the existence of only one correct solution. In practice, many instructions allow for multiple valid trajectories that can achieve the same goal. To address these limitations, we propose Relaxed HLP, a novel metric designed to evaluate high-level planning more flexibly by accounting for alternative valid plans. Relaxed HLP introduces three key considerations: temporal agnosticism, spatial agnosticism, and interchangeable actions, thus enabling a more comprehensive assessment of high-level plan accuracy. We validate the effectiveness of Relaxed HLP through human evaluations, demonstrating that it aligns more closely with human judgment compared to traditional ground truth-based metrics. Our results underscore the robustness of Relaxed HLP in capturing diverse, semantically equivalent plans, offering a more accurate assessment of high-level planning in EIF tasks.