A Semi-Automatic Labeling Framework for PCB Defects via Deep Embeddings and Density-Aware Clustering
Sang-Jeong Lee, Sungbal Seo, You-Suk Bae
(1) Background. Printed circuit board (PCB) inspection is increasingly constrained by the cost and latency of reliable labels, owing to tiny/low-contrast defects embedded in complex backgrounds and severe class imbalance. (2) Methods. We proposed a semi-automatic labeling pipeline that converts anomaly detection proposals into class labels via small margin cropping from images, interchangeable embeddings (HOG, ResNet-50, ViT-B/16), clustering (k-means/GMM/HDBSCAN), and cluster-level verification using representative montages. (3) Results. On 9354 cropped defects spanning 10 categories (imbalance IR ≈ 1542, Gini ≈ 0.642), ResNet-50 + HDBSCAN achieved NMI ≈ 0.290, AMI ≈ 0.283, and purity ≈ 0.624 with ~47 clusters; ViT + HDBSCAN was comparable (NMI ≈ 0.281, AMI ≈ 0.274, ~44 clusters). With a fixed taxonomy, k-means (K = 10) yielded the strongest ARI (0.169 with ResNet-50; 0.158 with ViT). Macro-purity exceeded micro-purity, indicating many small, homogeneous clusters suitable for one-shot acceptance/rejection, enabling an upper-bound ~200× reduction in operator decisions relative to per-image labeling. (4) Conclusions. The workflow provides an auditable, resource-flexible path from normal-only localization to scalable supervision, prioritizing labeling productivity over detector state-of-the-art and directly addressing the industrial bottleneck in the development lifecycle for PCB inspection.
https://doi.org/10.3390/s25206470
Bottleneck
Cluster analysis
Margin (machine learning)
Workflow
Pipeline (software)
Scalability
Printed circuit board
Anomaly detection
Spectral clustering
상세 정보 바로가기