기본 정보
연구 분야
프로젝트
발행물
구성원
article|
gold
·인용수 1
·2025
Hybrid Deep-Ensemble Network with VAE-Based Augmentation for Imbalanced Tabular Data Classification
Sang-Jeong Lee, You-Suk Bae
IF 2.5Applied Sciences
초록

Background: Severe class imbalance limits reliable tabular AI in manufacturing, finance, and healthcare. Methods: We built a modular pipeline comprising correlation-aware seriation; a hybrid convolutional neural network (CNN)–transformer–Bidirectional Long Short-Term Memory (BiLSTM) encoder; variational autoencoder (VAE)-based minority augmentation; and deep/tree ensemble heads (XGBoost and Support Vector Machine, SVM). We benchmarked the Synthetic Minority Oversampling Technique (SMOTE) and ADASYN under identical protocols. Focal loss and ensemble weights were tuned per dataset. The primary metric was the Area Under the Precision–Recall Curve (AUPRC), with receiver operating characteristic area under the curve (ROC AUC) as complementary. Synthetic-data fidelity was quantified by train-on-synthetic/test-on-real (TSTR) utility, two-sample discriminability (ROC AUC of a real-vs-synthetic classifier), and Maximum Mean Discrepancy (MMD2). Results: Across five datasets (SECOM, CREDIT, THYROID, APS, and UCI), augmentation was data-dependent: VAE led on APS (+3.66 pp AUPRC vs. SMOTE) and was competitive on CREDIT (+0.10 pp vs. None); the SMOTE dominated SECOM; no augmentation performed best for THYROID and UCI. Positional embedding (PE) with seriation helped when strong local correlations were present. Ensembles typically favored XGBoost while benefiting from the hybrid encoder. Efficiency profiling and a slim variant supported latency-sensitive use. Conclusions: A data-aware recipe emerged: prefer VAE when fidelity is high, the SMOTE on smoother minority manifolds, and no augmentation when baselines suffice; apply PE/seriation selectively and tune per dataset for robust, reproducible deployment.

키워드
Pattern recognition (psychology)EmbeddingOversamplingAutoencoderSupport vector machineReceiver operating characteristicArtificial neural networkMetric (unit)
타입
article
IF / 인용수
2.5 / 1
게재 연도
2025