Hybrid Deep-Ensemble Network with VAE-Based Augmentation for Imbalanced Tabular Data Classification | 이상정 교수 연구실 | 순천향대학교 컴퓨터공학과

이상정 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

gold

·인용수 1

·2025

Hybrid Deep-Ensemble Network with VAE-Based Augmentation for Imbalanced Tabular Data Classification

Sang-Jeong Lee, You-Suk Bae

IF 2.5Applied Sciences

초록

Background: Severe class imbalance limits reliable tabular AI in manufacturing, finance, and healthcare. Methods: We built a modular pipeline comprising correlation-aware seriation; a hybrid convolutional neural network (CNN)–transformer–Bidirectional Long Short-Term Memory (BiLSTM) encoder; variational autoencoder (VAE)-based minority augmentation; and deep/tree ensemble heads (XGBoost and Support Vector Machine, SVM). We benchmarked the Synthetic Minority Oversampling Technique (SMOTE) and ADASYN under identical protocols. Focal loss and ensemble weights were tuned per dataset. The primary metric was the Area Under the Precision–Recall Curve (AUPRC), with receiver operating characteristic area under the curve (ROC AUC) as complementary. Synthetic-data fidelity was quantified by train-on-synthetic/test-on-real (TSTR) utility, two-sample discriminability (ROC AUC of a real-vs-synthetic classifier), and Maximum Mean Discrepancy (MMD2). Results: Across five datasets (SECOM, CREDIT, THYROID, APS, and UCI), augmentation was data-dependent: VAE led on APS (+3.66 pp AUPRC vs. SMOTE) and was competitive on CREDIT (+0.10 pp vs. None); the SMOTE dominated SECOM; no augmentation performed best for THYROID and UCI. Positional embedding (PE) with seriation helped when strong local correlations were present. Ensembles typically favored XGBoost while benefiting from the hybrid encoder. Efficiency profiling and a slim variant supported latency-sensitive use. Conclusions: A data-aware recipe emerged: prefer VAE when fidelity is high, the SMOTE on smoother minority manifolds, and no augmentation when baselines suffice; apply PE/seriation selectively and tune per dataset for robust, reproducible deployment.

키워드

Pattern recognition (psychology)EmbeddingOversamplingAutoencoderSupport vector machineReceiver operating characteristicArtificial neural networkMetric (unit)

타입

article

IF / 인용수

2.5 / 1

원문

https://doi.org/10.3390/app151910360

게재 연도

2025