A Study on Performance Improvement of Vision Transformer through the Combination of Spatial-Cross Channel Attention | 배성근 교수 연구실 | 강남대학교 소프트웨어응용학부

배성근 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

발행물

구성원

article|

인용수 0

·2025

A Study on Performance Improvement of Vision Transformer through the Combination of Spatial-Cross Channel Attention

Minsung Choi, Hae-Won Lee, So Yoon Kim, Seong-Geon Bae

초록

Recent deep learning-based image classification studies show that Vision Transformer performs well and may replace Convolutional Neural Networks. This study compares various ViT-based Attention techniques and proposes a new MultiAttention method. As a result of examining learning stability and overfitting, the model applying the proposed MultiAttention technique showed a more stable learning process and achieved optimal performance than the existing Attention techniques. In particular, the difference between the train loss and the validation loss was not large, showing that overfitting was effectively alleviated. In terms of classification accuracy, the MultiAttention technique recorded the best performance, achieving 28% accuracy, which is 2-5% higher than the basic ViT model. Unlike existing techniques that improve performance only for specific classes such as airplane and deer, the MultiAttention technique evenly improves the overall performance, confirming that it is effective in various object recognition problems. The proposed MultiAttention technique improves ViT learning stability and generalization, showing balanced performance in object classification and strong real-world applicability. Future research could explore additional attention structures and evaluate their performance. If this research is advanced, it could potentially be applied in various application areas such as medical image analysis and autonomous driving. Since spatial attention was crucial in this study, future work will enhance its ability to learn spatial information for greater practicality.

키워드

OverfittingTransformerConvolutional neural networkStability (learning theory)Performance improvementDeep learningPattern recognition (psychology)Process (computing)

타입

article

IF / 인용수

- / 0

원문

https://doi.org/10.1109/acdsa65407.2025.11166285

게재 연도

2025