Abstract Introduction Polysomnography, the gold standard for diagnosing obstructive sleep apnea and other sleep disorders, requires manual scoring by trained sleep technologists or physicians to analyze its comprehensive physiological data. However, manual scoring is cost-intensive, prone to inconsistencies among scorers, and often causes delays in diagnosis and treatment initiation, potentially compromising patient outcomes. The purpose of this study is to validate the performance of artificial intelligence (AI)-based automatic polysomnography scoring system (SOMNUM) developed to overcome these limitations using a cross-sectional experimental design on a representative subjects of data recorded in the 3 sleep laboratories of university hospitals. Methods This study utilized a retrospective dataset of 400 polysomnography (PSG) recordings from three university hospitals: Soonchunhyang University Bucheon Hospital, Ajou University Hospital, and Chungnam University Hospital. Forty-eight full-night PSG studies were randomly selected from the dataset for the performance evaluation of SOMNUM, an AI-based automatic sleep scoring system. The performance of SOMNUM was assessed by comparing its scoring results with those of three human experts for four scoring functions: sleep staging, arousals, respiratory events, and periodic limb movements (PLMs). Performance metrics included positive percentage agreement, negative percentage agreement, and overall percentage agreement, calculated using R statistical software with relevant packages. Results The agreement rates between the scorer and AI-based automatic polysomnography scoring systems for each sleep stage (W, N1, N2, N3, R) were 89.96%, 77.93%, 91.13%, 84.04%, and 84.62%, respectively. The agreement rate between the scorer and AI for arousal was 82.5%, the agreement rate for respiratory events was 92.3%, and the agreement rate for periodic limb movements was 94.1%. Conclusion This study has demonstrated that SOMNUM is a valuable solution for automatic polysomnography interpretation. These results suggest that AI-based automated polysomnography scoring systems have the potential to advance the field of sleep medicine by improving diagnostic efficiency, reducing costs, and enhancing the consistency of sleep disorder diagnoses. Support (if any) This work was supported by the Seoul Business Agency (2023 Bio/Medical Technology Commercialization Supporting Project, BT230157). This work was supported by the Technology development Program (RS-2023-00321754) funded by the Ministry of SMEs and Startups (MSS, Korea).