기본 정보
연구 분야
발행물
구성원
article|
인용수 0
·2025
A Study on Improved ViT Using Adaptive Patch Sizing
Si-Eun Park, So Yoon Kim, Seong-Geon Bae
초록

In this study, we propose an edge density-based adaptive Vision Transformer (ViT) inference method to simultaneously improve accuracy and computational efficiency in image recognition tasks. Existing ViT models use a fixed patch size for all images, while the proposed method uses a Canny edge detector to calculate the edge density of the input image. Accordingly, the optimal patch size among pre-trained ViT models is dynamically selected. Complex images with high edge density are treated with small patches ($8 \times 8$), and simple images with large patches ($32 \times 32$). This adaptive selection mechanism allows high recognition performance to be maintained without changing the ViT structure or further learning. In this study, the proposed technique was evaluated on three datasets: CIFAR-10, STL-10, and ImageNet. As a result, we consistently achieved Top-1 accuracy and improved Top-5 accuracy, which are up to 5% p higher than existing ViT models based on fixed patches. In particular, despite the improved accuracy, the increase in inference delay was insignificant, and the number of images that can be inferred per second (FPS) also maintained competitiveness. These results suggest that edge-based adaptive patch selection is a realistic solution that can simultaneously improve ViT inference performance and computational efficiency. Furthermore, our method is compatible with pre-trained existing models and can be efficiently applied even in limited environments since no structural changes are required.

키워드
InferenceCanny edge detectorEdge detectionEnhanced Data Rates for GSM EvolutionSelection (genetic algorithm)SizingPattern recognition (psychology)
타입
article
IF / 인용수
- / 0
게재 연도
2025