주요 논문
5
*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.
1
article
|
인용수 0
·
2025Sensor Fusion‐Based Autoencoder Feature Distillation for 3D Object Detection
Junmin Lee, Wonjun Hwang
IF 0.7 (2025)
Electronics Letters
ABSTRACT Knowledge distillation is a widely adopted model compression method aimed at narrowing the performance gap between a high‐capacity teacher network and a lightweight student network. However, in the context of sensor fusion‐based 3D object detection, existing distillation methods predominantly emphasize accuracy enhancement through the introduction of multiple loss functions, which often leads to overly complex training procedures. To address this limitation, we propose a sensor fusion‐based feature distillation framework tailored for camera and radar modalities. Our proposed method utilizes an autoencoder to facilitate efficient knowledge transfer from the teacher to the student model. Additionally, we introduce image‐context and radar‐context knowledge distillation strategies to capture and transfer modality‐specific features effectively. We demonstrate the effectiveness of the proposed method on the nuScenes dataset using a ResNet‐based architecture.
https://doi.org/10.1049/ell2.70295
Autoencoder
Artificial intelligence
Pattern recognition (psychology)
Feature (linguistics)
Fusion
Object (grammar)
Computer vision
Computer science
Distillation
Sensor fusion
2
article
|
·
인용수 1
·
2025Knowledge tailoring: Bridging the teacher-student gap in semantic segmentation
Seokhwa Cheung, Seung-Beom Woo, Taehoon Kim, Wonjun Hwang
IF 7.6 (2025)
Pattern Recognition
https://doi.org/10.1016/j.patcog.2025.112399
Bridging (networking)
Segmentation
Natural language processing
Computer science
Artificial intelligence
Mathematics education
Psychology
3
article
|
·
인용수 7
·
2025Bridging domain spaces for unsupervised domain adaptation
Jaemin Na, Heechul Jung, Hyung Jin Chang, Wonjun Hwang
IF 7.6 (2025)
Pattern Recognition
https://doi.org/10.1016/j.patcog.2025.111537
Bridging (networking)
Domain adaptation
Computer science
Domain (mathematical analysis)
Artificial intelligence
Adaptation (eye)
Pattern recognition (psychology)
Mathematics
Psychology
Neuroscience
4
article
|
인용수 0
·
2024Channel and Spatial Enhancement Network for human parsing
Kunliang Liu, Rize Jin, Yuelong Li, Jianming Wang, Wonjun Hwang
IF 4.2 (2024)
Image and Vision Computing
The dominant backbones of neural networks for scene parsing consist of multiple stages, where feature maps in different stages often contain varying levels of spatial and semantic information. High-level features convey more semantics and fewer spatial details, while low-level features possess fewer semantics and more spatial details. Consequently, there are semantic-spatial gaps among features at different levels, particularly in human parsing tasks. Many existing approaches directly upsample multi-stage features and aggregate them through addition or concatenation, without addressing the semantic-spatial gaps present among these features. This inevitably leads to spatial misalignment, semantic mismatch, and ultimately misclassification in parsing, especially for human parsing that demands more semantic information and more fine details of feature maps for the reason of intricate textures, diverse clothing styles, and heavy scale variability across different human parts. In this paper, we effectively alleviate the long-standing challenge of addressing semantic-spatial gaps between features from different stages by innovatively utilizing the subtraction and addition operations to recognize the semantic and spatial differences and compensate for them. Based on these principles, we propose the Channel and Spatial Enhancement Network (CSENet) for parsing, offering a straightforward and intuitive solution for addressing semantic-spatial gaps via injecting high-semantic information to lower-stage features and vice versa, introducing fine details to higher-stage features. Extensive experiments on three dense prediction tasks have demonstrated the efficacy of our method. Specifically, our method achieves the best performance on the LIP and CIHP datasets and we also verify the generality of our method on the ADE20K dataset. • We propose the CSENet which effectively addresses the challenge of semantic and spatial gaps between feature maps from different stages in human parsing. By utilizing the operations of subtraction and addition to calculate and compensate the feature differences, CSENet reduces the semantic gaps and successfully introduce high-semantic information to low-level feature and fine details to high-level feature to benefit recognizing large objects and inconspicuous parts, especially in the context of human parsing. • We introduce CEM and SEM as the main components of CSENet. CEM employs average pooling, subtraction and addition to calculate and compensate semantic differences, while SEM utilizes similar operations to compute and compensate the spatial differences. These modules enhance the discriminative ability of feature representations, improving the recognition of fine details, inner patterns, and accurate spatial locations of human parts. • Our CSENet is shown to be effective and efficient in improving the performance of existing backbones. Our modules are general and can be easily integrated into existing architectures, enabling the effective assembly of feature maps from deep to shallow layers. Experimental results demonstrate the efficacy of our CSENet. Our method achieves SOTA performances on LIP and CIHP datasets without using pose information or the hierarchy structure of the class in the scene. We also validate the generality of our method via using the transformer as the backbone on the scene parsing dataset ADE20K.
https://doi.org/10.1016/j.imavis.2024.105332
Parsing
Channel (broadcasting)
Computer science
Artificial intelligence
Natural language processing
Computer network
5
article
|
인용수 2
·
2023Class relationship‐based knowledge distillation for efficient human parsing
Yuqi Lang, Kunliang Liu, Jianming Wang, Wonjun Hwang
IF 0.7 (2023)
Electronics Letters
Abstract In computer vision, human parsing is challenging due to its demand for accurate human region location and semantic partitioning. This dense prediction task needs powerful computation and high‐precision models. To enable real‐time parsing on resource‐limited devices, the authors introduced a lightweight model using ResNet18 as a core network . The authors simplified the pyramid module, improving context clarity and reducing complexity. The authors integrated a spatial attention fusion strategy to counter precision loss in the light‐weighting process. Traditional models, despite their segmentation precision, are limited by their computational complexity and extensive parameters. The authors implemented knowledge distillation (KD) techniques to enhance the authors’ lightweight network's accuracy. Traditional methods can fail to learn useful knowledge with significant network differences. Hence, the authors used a novel distillation approach based on inter‐class and intra‐class relations in prediction outcomes, noticeably improving parsing accuracy. The authors’ experiments on the Look into Person (LIP) dataset show that their lightweight model significantly reduces parameters while maintaining parsing precision and enhancing inference speed.
https://doi.org/10.1049/ell2.12900
Computer science
Parsing
Artificial intelligence
Machine learning
Context (archaeology)
Inference
Weighting
Process (computing)
Class (philosophy)
Benchmark (surveying)