Developing tactically proficient and robust artificial intelligence (AI) pilots for flight simulators requires systematic performance evaluation. The purpose of this study is to identify and verify performance measures specifically tailored to AI pilots in a tactical combat mission scenario. A 2v2 air combat scenario was selected as the operational context for deriving the measures. Feedback on the appropriateness of the measures was collected through a questionnaire survey with human pilots. As a result, seven performance measures were identified as appropriate for the 2 v 2 air combat scenario. This study can provide initial grounds for assessing AI pilots in simulators and supports the development of optimal tactics and enhanced flight training programs.