Vision Transformers have been applied in various domains, but their lack of inductive bias limits training stability and performance on small-scale datasets. While previous studies have attempted to address this issue through structural modifications, this study proposes a normalization method that reduces the self-similarity value in self-attention to enhance token-to-token interactions. This approach introduces inductive bias without altering the architecture or increasing computational complexity. Experiments on the CIFAR-10 dataset demonstrate that the proposed method improves the final validation accuracy (raw) by 2.62% and decreases the loss, thereby enhancing training stability.