MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model | 심현철 교수 연구실 | 한국과학기술원 전기및전자공학부

심현철 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

인용수 2

·2025

MonoDINO-DETR: Depth-Enhanced Monocular 3D Object Detection Using a Vision Foundation Model

Jihyeok Kim, Seong-Woo Moon, Sungwon Nahl, David Hyunchul Shim

초록

This paper proposes novel methods to enhance the performance of monocular 3D object detection models by lever-aging the generalized feature extraction capabilities of a vision foundation model. Unlike traditional CNN-based approaches, which often suffer from inaccurate depth estimation and rely on multi-stage object detection pipelines, this study employs a Vision Transformer (ViT)-based foundation model as the backbone, which excels at capturing global features for depth estimation. It integrates a detection transformer (DETR) archi-tecture to improve both depth estimation and object detection performance in a one-stage manner. Specifically, a hierarchical feature fusion block is introduced to extract richer visual features from the foundation model, further enhancing feature extraction capabilities. Depth estimation accuracy is further improved by incorporating a relative depth estimation model trained on large-scale data and fine-tuning it through transfer learning. Additionally, the use of queries in the transformer's decoder, which consider reference points and the dimensions of 2D bounding boxes, enhances recognition performance. The proposed model outperforms recent state-of-the-art methods, as demonstrated through quantitative and qualitative evaluations on the KITTI 3D benchmark and a custom dataset collected from high-elevation racing environments. Code is available at https://github.com/JihyeokKim/MonoDINO-DETR.

키워드

Computer visionArtificial intelligenceMonocularComputer scienceFoundation (evidence)Object detectionMonocular visionObject (grammar)Pattern recognition (psychology)Geography

타입

article

IF / 인용수

- / 2

원문

https://doi.org/10.1109/iv64158.2025.11097713

게재 연도

2025