Human Scene Understanding Mechanism-Based Image Captioning for Blind Assistance | 김종훈 교수 연구실 | 충남대학교 전기공학과

김종훈 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

gold

·인용수 2

·2025

Human Scene Understanding Mechanism-Based Image Captioning for Blind Assistance

Jonghoon Kim, Sung-Wook Park, Jun‐Ho Huh, Se-Hoon Jung, Chun-Bo Sim

IF 3.6IEEE Access

초록

This study, a model is proposed that generates descriptive sentences for input images visually impaired individuals. For this purpose, a novel image captioning approach is introduced, integrating the principles of human visual Understanding mechanisms with the Vision Transformer (ViT) architecture, further enhanced by deep reinforcement learning. First, features are extracted from the image based on human visual perception. Second, the image features are encoded through the encoding block of ViT and input into the long short-term memory (LSTM) network to generate annotations for the image. Finally, reinforcement learning is optimized to further enhance the accuracy of the generated captions. Evaluations were performed utilizing the MSR-VTT benchmark dataset, which is widely used for image captioning tasks. Experimental results on the MSR-VTT benchmark dataset demonstrate that the proposed model achieves BLEU-4 of 43.0, METEOR of 29.1, ROUGE-L of 62.7, and CIDEr-D of 54.9, surpassing state-of-the-art baseline models across all evaluation metrics. The proposed model can be applied to video annotation applications for the visually impaired. In contrast to prior works that primarily rely on conventional convolutional architectures, the proposed model uniquely incorporates human-inspired visual perception principles and Vision Transformer-based global encoding, offering a novel and interpretable framework tailored for assistive image captioning.

키워드

Closed captioningComputer scienceComputer visionMechanism (biology)Artificial intelligenceImage (mathematics)

타입

article

IF / 인용수

3.6 / 2

원문

https://doi.org/10.1109/access.2025.3564991

게재 연도

2025