This study proposes an efficient framework to improve the classification performance of Distillation with NO label(DINO)-based Vision Transformer models in a small labeled dataset environment. The proposed method utilizes powerful multi-crop-based data augmentation techniques and various visual transformations such as Auto Augment and Gaussian Blur to maximize the diversity of receptive fields and local features in the input image. In addition, by introducing a gradual unfreezing strategy that keeps the backbone network frozen during the initial 1 to 3 epochs of learning and trains only the Multi-Layer Perceptron(MLP)-based classification head, the problem of overfitting during fine-tuning was alleviated while maintaining a stable pre-trained generalized visual representation. The lightweight MLP head structure, AdamW optimizer, and Cosine Annealing learning rate scheduling were also applied to simultaneously improve the learning stability and convergence speed of the model. In extensive experiments centered on the STL-10 dataset, the proposed framework showed significant performance improvement in Top-1 accuracy, K-Nearest Neighbors(KNN) accuracy, and Linear Probe results compared to existing DINO models, and demonstrated more than 50% faster inference. Although the number of parameters in the MLP head increased, the actual execution efficiency was found to be rather improved thanks to parallel processing optimization and learning schedule adjustment. This study presents meaningful progress in further increasing the transfer learning efficiency and versatility of the self-supervised learning-based DINO-Transformer model, thereby significantly expanding the practical possibilities for field applications.Future research aims to improve more generalized performance and strengthen practical application possibilities by optimizing learning rates, diversifying augmentation policies, and automating backbone unfreezing.