기본 정보
연구 분야
발행물
구성원
article|
인용수 0
·2025
A Study on Performance Improvement of Vision Transformer through the Combination of Spatial-Cross Channel Attention
Minsung Choi, Hae-Won Lee, So Yoon Kim, Seong-Geon Bae
초록

Recent deep learning-based image classification studies show that Vision Transformer performs well and may replace Convolutional Neural Networks. This study compares various ViT-based Attention techniques and proposes a new MultiAttention method. As a result of examining learning stability and overfitting, the model applying the proposed MultiAttention technique showed a more stable learning process and achieved optimal performance than the existing Attention techniques. In particular, the difference between the train loss and the validation loss was not large, showing that overfitting was effectively alleviated. In terms of classification accuracy, the MultiAttention technique recorded the best performance, achieving 28% accuracy, which is 2-5% higher than the basic ViT model. Unlike existing techniques that improve performance only for specific classes such as airplane and deer, the MultiAttention technique evenly improves the overall performance, confirming that it is effective in various object recognition problems. The proposed MultiAttention technique improves ViT learning stability and generalization, showing balanced performance in object classification and strong real-world applicability. Future research could explore additional attention structures and evaluate their performance. If this research is advanced, it could potentially be applied in various application areas such as medical image analysis and autonomous driving. Since spatial attention was crucial in this study, future work will enhance its ability to learn spatial information for greater practicality.

키워드
OverfittingTransformerConvolutional neural networkStability (learning theory)Performance improvementDeep learningPattern recognition (psychology)Process (computing)
타입
article
IF / 인용수
- / 0
게재 연도
2025