기본 정보
연구 분야
프로젝트
논문
구성원
article|
인용수 0
·2025
WINter-ViT : Window Interaction Vision Transformer with Head-Aware Attention
Jihyeok Kim, Jaehyeok Kim, So-Yun Park, Jinwoo Yoo
The Transactions of The Korean Institute of Electrical Engineers
초록

While the Swin Transformer effectively reduces computational cost using window-based attention, it struggles to model global dependencies across windows. Prior work, such as the Refined Transformer, attempts to overcome this limitation by incorporating CBAM-style channel and spatial attention mechanisms. However, these sequential attention operations often introduce representational bias by overemphasizing specific features. To address this, we propose two key components: (1) the Efficient Head Self-Attention (EHSA) module, which dynamically calibrates the relative contribution of each attention head within a window, and (2) the Hierarchical Local-to-Global Spatial Attention (HLSA) module, which captures long-range interactions across windows in a hierarchical manner. By integrating these into a Swin-T backbone, our architecture improves both local detail modeling and global context aggregation. Experiments on ImageNet-1K and ImageNet100 demonstrate that our model surpasses the Refined Transformer and other window-based approaches in accuracy, while maintaining a comparable level of computational efficiency. These results validate the effectiveness of our design in enhancing local-global interactions within Vision Transformers.

키워드
TransformerArchitectureKey (lock)Spatial contextual awarenessSpatial configurationComputational model
타입
article
IF / 인용수
- / 0
게재 연도
2025

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)

© 2026 RnDcircle. All Rights Reserved.