논문 | 황순민 교수 연구실 | 한양대학교 미래자동차공학과

황순민 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

논문

연구 성과 추이

표시된 성과는 수집된 데이터 기준으로 산출되며, 일부 차이가 있을 수 있습니다.

5개년 연도별 논문 게재 수

17총합

5개년 연도별 피인용 수

72총합

주요 논문

article

인용수 0

2025

Efficient Occupancy Prediction with Instance-Level Attention

Sungjin Park, Jaeha Song, Soonmin Hwang

Occupancy prediction is a critical task in autonomous driving, enabling better understanding of 3D environments for downstream tasks. Previous methods often rely on dense back-projection methods to extract 3D features from 2D images by distributing information across all voxels. While effective, these approaches are computationally expensive and inefficient due to the dense nature of 3D voxel representations. Inspired by recent works, we address this challenge by instance-level attention that utilizes representative queries for groups of voxels, reducing computational cost while maintaining competitive performance. By applying attention mechanisms to this instance-level representation, we achieve an mIoU score of 35.26 with a latency of 0.04s on the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">

O

</tex>cc3D dataset in RTX4090. These results demonstrate that focusing on instance-level representations provides an efficient and practical solution for real-time occupancy prediction tasks.

https://doi.org/10.1109/icaiic64266.2025.10920673

Occupancy

Computer science

Engineering

Architectural engineering

article

인용수 2

2024

UMHE: Unsupervised Multispectral Homography Estimation

Jeongmin Shin, Jiwon Kim, S. M. Kwon, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 4.5

IEEE Sensors Journal

Multispectral image alignment plays a crucial role in exploiting complementary information between different spectral images. Homography-based image alignment can be a practical solution considering a tradeoff between runtime and accuracy. Existing methods, however, have difficulty with multispectral images due to the additional spectral gap or require expensive human labels to train models. To solve these problems, this paper presents a comprehensive study on multispectral homography estimation in an unsupervised learning manner. We propose a curriculum data augmentation, an effective solution for models learning spectrum-agnostic representation by providing diverse input pairs. We also propose to use the phase congruency loss that explicitly calculates the reconstruction between images based on low-level structural information in the frequency domain. To encourage multispectral alignment research, we introduce a novel FLIR corresponding dataset that has manually labeled local correspondences between multispectral images. Our model achieves state-of-the-art alignment performance on the proposed FLIR correspondence dataset among supervised and unsupervised methods while running at <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">151 FPS</i> . Furthermore, our model shows good generalization ability on the M3FD dataset without finetuning.

https://doi.org/10.1109/jsen.2024.3383453

Multispectral image

Homography

Artificial intelligence

Estimation

Computer science

Multispectral pattern recognition

Computer vision

Remote sensing

Mathematics

Geology

article

인용수 36

2022

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 5.3

IEEE Robotics and Automation Letters

Recently, transformers have been widely adopted for various computer vision tasks and show promising results due to their ability to encode long-range spatial dependencies in an image effectively. However, very few studies on adopting transformers in self-supervised depth estimation have been conducted. When replacing the CNN architecture with the transformer in self-supervised learning of depth, we encounter several problems such as problematic multi-scale photometric loss function when used with transformers and, insufficient ability to capture local details. In this letter, we propose an attention-based decoder module, Pixel-Wise Skip Attention (PWSA), to enhance fine details in feature maps while keeping global context from transformers. In addition, we propose utilizing self-distillation loss with single-scale photometric loss to alleviate the instability of transformer training by using correct training signals. We demonstrate that the proposed model performs accurate predictions on large objects and thin structures that require global context and local details. Our model achieves state-of-the-art performance among the self-supervised monocular depth estimation methods on KITTI and DDAD benchmarks.

https://doi.org/10.1109/lra.2022.3196781

Computer science

Transformer

Artificial intelligence

Monocular

Pixel

Machine learning

Pattern recognition (psychology)

Computer vision

Engineering

Voltage

전체 논문

article

인용수 0

2025

Efficient Occupancy Prediction with Instance-Level Attention

Sungjin Park, Jaeha Song, Soonmin Hwang

O

</tex>cc3D dataset in RTX4090. These results demonstrate that focusing on instance-level representations provides an efficient and practical solution for real-time occupancy prediction tasks.

https://doi.org/10.1109/icaiic64266.2025.10920673

Occupancy

Computer science

Engineering

Architectural engineering

article

인용수 2

2024

UMHE: Unsupervised Multispectral Homography Estimation

Jeongmin Shin, Jiwon Kim, S. M. Kwon, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 4.5

IEEE Sensors Journal

https://doi.org/10.1109/jsen.2024.3383453

Multispectral image

Homography

Artificial intelligence

Estimation

Computer science

Multispectral pattern recognition

Computer vision

Remote sensing

Mathematics

Geology

article

인용수 36

2022

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 5.3

IEEE Robotics and Automation Letters

https://doi.org/10.1109/lra.2022.3196781

Computer science

Transformer

Artificial intelligence

Monocular

Pixel

Machine learning

Pattern recognition (psychology)

Computer vision

Engineering

Voltage

article

인용수 0

2025

Boosting Cross-Spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation

S. M. Kwon, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

In autonomous driving, thermal image semantic segmentation has emerged as a critical research area, owing to its ability to provide robust scene understanding under adverse visual conditions. In particular, unsupervised domain adaptation (UDA) for thermal image segmentation can be an efficient solution to address the lack of labeled thermal datasets. Nevertheless, since these methods do not effectively utilize the complementary information between RGB and thermal images, they significantly decrease performance during domain adaptation. In this paper, we present a comprehensive study on cross-spectral UDA for thermal image semantic segmentation. We first propose a novel masked mutual learning strategy that promotes complementary information exchange by selectively transferring results between each spectral model while masking out uncertain regions. Additionally, we introduce a novel prototypical self-supervised loss designed to enhance the performance of the thermal segmentation model in nighttime scenarios. This approach addresses the limitations of RGB pre-trained networks, which cannot effectively transfer knowledge under low illumination due to the inherent constraints of RGB sensors. In experiments, our method achieves higher performance over previous UDA methods and comparable performance to state-of-the-art supervised methods.

https://doi.org/10.1109/icra55743.2025.11127724

Boosting (machine learning)

Computer science

Domain adaptation

Segmentation

Artificial intelligence

Pattern recognition (psychology)

Classifier (UML)

article

인용수 5

2025

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon Kim, Youngjae Yu

Proceedings of the AAAI Conference on Artificial Intelligence

Speech-driven 3D facial animation has garnered lots of attention thanks to its broad range of applications. Despite recent advancements in achieving realistic lip motion, current methods fail to capture the nuanced emotional undertones conveyed through speech and produce monotonous facial motion. These limitations result in blunt and repetitive facial animations, reducing user engagement and hindering their applicability. To address these challenges, we introduce DEEPTalk, a novel approach that generates diverse and emotionally rich 3D facial expressions directly from speech inputs. To achieve this, we first train DEE (Dynamic Emotion Embedding), which employs probabilistic contrastive learning to forge a joint emotion embedding space for both speech and facial motion. This probabilistic framework captures the uncertainty in interpreting emotions from speech and facial motion, enabling the derivation of emotion vectors from its multifaceted space. Moreover, to generate dynamic facial motion, we design TH-VQVAE (Temporally Hierarchical VQ-VAE) as an expressive and robust motion prior overcoming limitations of VAEs and VQ-VAEs. Utilizing these strong priors, we develop DEEPTalk, a talking head generator that non-autoregressively predicts codebook indices to create dynamic facial motion, incorporating a novel emotion consistency loss. Extensive experiments on various datasets demonstrate the effectiveness of our approach in creating diverse, emotionally expressive talking faces that maintain accurate lip-sync. Our project page is available at https://whwjdqls.github.io/deeptalk.github.io/.

https://doi.org/10.1609/aaai.v39i4.32449

Embedding

Face (sociological concept)

Animation

Computer science

Probabilistic logic

Computer facial animation

Speech recognition

Computer animation

Human–computer interaction

Artificial intelligence

preprint

green

인용수 0

2025

Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation

S. M. Kwon, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

ArXiv.org

http://arxiv.org/abs/2505.06951

Segmentation

Boosting (machine learning)

Domain adaptation

RGB color model

Masking (illustration)

Pattern recognition (psychology)

Image segmentation

Robustness (evolution)

article

인용수 0

2025

RoCaRS: Robust Camera-Radar BEV Segmentation for Sensor Failure Scenarios

B. Park, Jeongtae Kim, Yunseol Cho, Soonmin Hwang

While camera–radar fusion has led to notable progress in autonomous driving, many existing approaches overlook the risk of sensor failures, which can critically compromise system safety. To address this limitation, we propose RoCaRS, a robust camera–radar fusion model designed for bird’s-eye view (BEV) segmentation under sensor failure scenarios. RoCaRS incorporates two key components—Radar-aware Backbone (RB) and Feature Spreading (FS)—to enhance BEV feature representation, along with a Dynamic Input Dropout Strategy (DIDS) and Bidirectional Feature Refinement (BFR) to address missing sensor inputs. Experiments on the nuScenes benchmark show that RoCaRS not only outperforms state-of-the-art fusion models under normal conditions but also maintains high performance under various sensor failure settings. Notably, in the complete absence of camera input, RoCaRS exceeds the baseline by +23.2 mIoU for map and +30.0 IoU for vehicle. Furthermore, it retains 99% of the radar-only model’s performance and achieves 103% of the camera-only model’s performance when either all cameras or all radars are disabled—without any retraining. These results highlight the potential of intermediate fusion to match the robustness of late fusion, while more effectively leveraging complementary modalities.

https://doi.org/10.1109/iros60139.2025.11246952

Robustness (evolution)

Benchmark (surveying)

Segmentation

Feature (linguistics)

Sensor fusion

Fusion

Key (lock)

article

인용수 0

2025

End-to-end Radio SLAM for Autonomous Systems

Kyeong-Ju Cha, Hyunwoo Park, W. Choe, Soonmin Hwang, Sunwoo Kim

This paper proposes an end-to-end radio simultaneous localization and mapping (SLAM) algorithm that directly leverages channel impulse response (CIR) to overcome fundamental limitations in existing approaches. Traditional radio SLAM algorithms assume pre-estimated channel parameters, making performance highly sensitive to estimation accuracy, while recent end-to-end methods jointly perform parameter estimation and SLAM but suffer from high computational complexity and model mismatch vulnerability. The proposed algorithm minimizes information loss by operating directly on raw CIR measurements and utilizes end-to-end learning for enhanced robustness. Simulation results in the 3GPP TR 38.857 indoor factory scenario demonstrate that the proposed algorithm achieves comparable performance to conventional radio SLAM while reducing computational time by less than

2%

, confirming its strong potential for practical deployment.

https://doi.org/10.1109/sips66314.2025.11261314

Computational complexity theory

Channel (broadcasting)

Simultaneous localization and mapping

Impulse response

Impulse (physics)

Estimation theory

article

인용수 0

2025

Leveraging Camera-Based Methods for Enhanced Feature-to-World Mapping

Jaeha Song, Sungjin Park, Soonmin Hwang

Scene representation in autonomous driving relies heavily on extracting meaningful features from images and accurately mapping them to 3D world coordinates. Traditional methods, such as ResNet-based backbones pretrained on ImageNet, provide a robust foundation for feature extraction but are increasingly viewed as limited when it comes to aligning features with the 3D world. This paper explores the integration of advanced segmentation models as backbones, focusing on how feature quality at the extraction stage directly impacts downstream scene representation tasks. Preliminary experiments demonstrate the potential for improved feature alignment and semantic consistency, highlighting the importance of robust backbone design in modern 3D perception pipelines.

https://doi.org/10.1109/icaiic64266.2025.10920789

Computer science

Feature (linguistics)

Artificial intelligence

Computer vision

article

인용수 9

2024

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, Jean Oh

As autonomous driving technology matures, the safety and robustness of its key components, including trajectory prediction is vital. Although real-world datasets such as Waymo Open Motion provide recorded real scenarios, the majority of the scenes appear benign, often lacking diverse safety-critical situations that are essential for developing robust models against nuanced risks. However, generating safety-critical data using simulation faces severe simulation to real gap. Using real-world environments is even less desirable due to safety risks. In this context, we propose an approach to utilize existing real-world datasets by identifying safetyrelevant scenarios naively overlooked, e.g., near misses and proactive maneuvers. Our approach expands the spectrum of safety-relevance, allowing us to study trajectory prediction models under a safety-informed, distribution shift setting. We contribute a versatile scenario characterization method, a novel scoring scheme for reevaluating a scene using counterfactual scenarios to find hidden risky scenarios, and an evaluation of trajectory prediction models in this setting. We further contribute a remediation strategy, achieving a 10% average reduction in predicted trajectories’ collision rates. To facilitate future research, we release our code for this overall SafeShift framework to the public: github.com/cmubig/SafeShift

https://doi.org/10.1109/iv55156.2024.10588828

Trajectory

Computer science

Distribution (mathematics)

Mathematics

Physics

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)

주요 논문

article

인용수 0

2025

Efficient Occupancy Prediction with Instance-Level Attention

Sungjin Park, Jaeha Song, Soonmin Hwang

O

</tex>cc3D dataset in RTX4090. These results demonstrate that focusing on instance-level representations provides an efficient and practical solution for real-time occupancy prediction tasks.

https://doi.org/10.1109/icaiic64266.2025.10920673

Occupancy

Computer science

Engineering

Architectural engineering

article

인용수 2

2024

UMHE: Unsupervised Multispectral Homography Estimation

Jeongmin Shin, Jiwon Kim, S. M. Kwon, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 4.5

IEEE Sensors Journal

https://doi.org/10.1109/jsen.2024.3383453

Multispectral image

Homography

Artificial intelligence

Estimation

Computer science

Multispectral pattern recognition

Computer vision

Remote sensing

Mathematics

Geology

article

인용수 36

2022

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 5.3

IEEE Robotics and Automation Letters

https://doi.org/10.1109/lra.2022.3196781

Computer science

Transformer

Artificial intelligence

Monocular

Pixel

Machine learning

Pattern recognition (psychology)

Computer vision

Engineering

Voltage

전체 논문

article

인용수 0

2025

Efficient Occupancy Prediction with Instance-Level Attention

Sungjin Park, Jaeha Song, Soonmin Hwang

O

</tex>cc3D dataset in RTX4090. These results demonstrate that focusing on instance-level representations provides an efficient and practical solution for real-time occupancy prediction tasks.

https://doi.org/10.1109/icaiic64266.2025.10920673

Occupancy

Computer science

Engineering

Architectural engineering

article

인용수 2

2024

UMHE: Unsupervised Multispectral Homography Estimation

Jeongmin Shin, Jiwon Kim, S. M. Kwon, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 4.5

IEEE Sensors Journal

https://doi.org/10.1109/jsen.2024.3383453

Multispectral image

Homography

Artificial intelligence

Estimation

Computer science

Multispectral pattern recognition

Computer vision

Remote sensing

Mathematics

Geology

article

인용수 36

2022

TransDSSL: Transformer Based Depth Estimation via Self-Supervised Learning

Daechan Han, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

IF 5.3

IEEE Robotics and Automation Letters

https://doi.org/10.1109/lra.2022.3196781

Computer science

Transformer

Artificial intelligence

Monocular

Pixel

Machine learning

Pattern recognition (psychology)

Computer vision

Engineering

Voltage

article

인용수 0

2025

Boosting Cross-Spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation

S. M. Kwon, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

https://doi.org/10.1109/icra55743.2025.11127724

Boosting (machine learning)

Computer science

Domain adaptation

Segmentation

Artificial intelligence

Pattern recognition (psychology)

Classifier (UML)

article

인용수 5

2025

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation

Jisoo Kim, Jungbin Cho, Joonho Park, Soonmin Hwang, Da Eun Kim, Geon Kim, Youngjae Yu

Proceedings of the AAAI Conference on Artificial Intelligence

https://doi.org/10.1609/aaai.v39i4.32449

Embedding

Face (sociological concept)

Animation

Computer science

Probabilistic logic

Computer facial animation

Speech recognition

Computer animation

Human–computer interaction

Artificial intelligence

preprint

green

인용수 0

2025

Boosting Cross-spectral Unsupervised Domain Adaptation for Thermal Semantic Segmentation

S. M. Kwon, Jeongmin Shin, Namil Kim, Soonmin Hwang, Yukyung Choi

ArXiv.org

http://arxiv.org/abs/2505.06951

Segmentation

Boosting (machine learning)

Domain adaptation

RGB color model

Masking (illustration)

Pattern recognition (psychology)

Image segmentation

Robustness (evolution)

article

인용수 0

2025

RoCaRS: Robust Camera-Radar BEV Segmentation for Sensor Failure Scenarios

B. Park, Jeongtae Kim, Yunseol Cho, Soonmin Hwang

https://doi.org/10.1109/iros60139.2025.11246952

Robustness (evolution)

Benchmark (surveying)

Segmentation

Feature (linguistics)

Sensor fusion

Fusion

Key (lock)

article

인용수 0

2025

End-to-end Radio SLAM for Autonomous Systems

Kyeong-Ju Cha, Hyunwoo Park, W. Choe, Soonmin Hwang, Sunwoo Kim

2%

, confirming its strong potential for practical deployment.

https://doi.org/10.1109/sips66314.2025.11261314

Computational complexity theory

Channel (broadcasting)

Simultaneous localization and mapping

Impulse response

Impulse (physics)

Estimation theory

article

인용수 0

2025

Leveraging Camera-Based Methods for Enhanced Feature-to-World Mapping

Jaeha Song, Sungjin Park, Soonmin Hwang

https://doi.org/10.1109/icaiic64266.2025.10920789

Computer science

Feature (linguistics)

Artificial intelligence

Computer vision

article

인용수 9

2024

SafeShift: Safety-Informed Distribution Shifts for Robust Trajectory Prediction in Autonomous Driving

Benjamin Stoler, Ingrid Navarro, Meghdeep Jana, Soonmin Hwang, Jonathan Francis, Jean Oh

https://doi.org/10.1109/iv55156.2024.10588828

Trajectory

Computer science

Distribution (mathematics)

Mathematics

Physics