Relaxed HLP: A Metric for Flexible High-Level Plan Evaluation in Embodied Environments | 장병탁 교수 연구실 | 서울대학교 컴퓨터공학부

장병탁 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

인용수 0

·2025

Relaxed HLP: A Metric for Flexible High-Level Plan Evaluation in Embodied Environments

Suyeon Shin, Byoung‐Tak Zhang

초록

As the field of robotics advances, Embodied Instruction Following (EIF) has emerged as a key challenge in artificial intelligence. EIF tasks require agents to interpret and execute natural language instructions by predicting and completing sequences of subgoals within a physical environment. Traditional evaluation metrics—such as Success Rate (SR), Goal Condition (GC), Path Length Weighted SR (PLWSR), and Path Length Weighted GC (PLWGC)—primarily focus on task success following low-level control actions. However, these metrics inadequately assess the accuracy of high-level planning, which is critical for overall task performance. Existing methods for evaluating high-level planning often rely on comparing predicted plans to a single human-annotated ground truth trajectory, implicitly assuming the existence of only one correct solution. In practice, many instructions allow for multiple valid trajectories that can achieve the same goal. To address these limitations, we propose Relaxed HLP, a novel metric designed to evaluate high-level planning more flexibly by accounting for alternative valid plans. Relaxed HLP introduces three key considerations: temporal agnosticism, spatial agnosticism, and interchangeable actions, thus enabling a more comprehensive assessment of high-level plan accuracy. We validate the effectiveness of Relaxed HLP through human evaluations, demonstrating that it aligns more closely with human judgment compared to traditional ground truth-based metrics. Our results underscore the robustness of Relaxed HLP in capturing diverse, semantically equivalent plans, offering a more accurate assessment of high-level planning in EIF tasks.

키워드

Embodied cognitionMetric (unit)Computer sciencePlan (archaeology)Human–computer interactionArtificial intelligenceEngineeringOperations managementGeology

타입

article

IF / 인용수

- / 0

원문

https://doi.org/10.1109/bigcomp64353.2025.00061

게재 연도

2025