Audio-guided implicit neural representation for local image stylization
Seung Hyun Lee, Sieun Kim, Wonmin Byeon, Gyeongrok Oh, Sumin In, Hyeongcheol Park, Sang Ho Yoon, Sunghee Hong, Jinkyu Kim, Sangpil Kim
IF 18.3
Computational Visual Media
We present a novel framework for audio-guided localized image stylization. Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a particular part of the image based on audio input is natural but challenging. This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene. We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space. We then utilize an implicit neural representation (INR) along with the predicted localization map to stylize the target based on sound information. The INR manipulates local pixel values to be semantically consistent with the provided audio input. Our experiments show that the proposed framework outperforms other audio-guided stylization methods. Moreover, we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input.
In vitro assessment of the anti-inflammatory and skin-moisturizing effects of Filipendula palmata (Pall.) Maxim. On human keratinocytes and identification of its bioactive phytochemicals
Xiao-jie Mi, Jinkyu Kim, Sanghyun Lee, Sung‐Kwon Moon, Yeon-Ju Kim, Hoon Kim
Audio-guided implicit neural representation for local image stylization
Seung Hyun Lee, Sieun Kim, Wonmin Byeon, Gyeongrok Oh, Sumin In, Hyeongcheol Park, Sang Ho Yoon, Sunghee Hong, Jinkyu Kim, Sangpil Kim
IF 18.3
Computational Visual Media
We present a novel framework for audio-guided localized image stylization. Sound often provides information about the specific context of a scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a particular part of the image based on audio input is natural but challenging. This work proposes a framework in which a user provides an audio input to localize the target in the input image and another to locally stylize the target object or scene. We first produce a fine localization map using an audio-visual localization network leveraging CLIP embedding space. We then utilize an implicit neural representation (INR) along with the predicted localization map to stylize the target based on sound information. The INR manipulates local pixel values to be semantically consistent with the provided audio input. Our experiments show that the proposed framework outperforms other audio-guided stylization methods. Moreover, we observe that our method constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input.
In vitro assessment of the anti-inflammatory and skin-moisturizing effects of Filipendula palmata (Pall.) Maxim. On human keratinocytes and identification of its bioactive phytochemicals
Xiao-jie Mi, Jinkyu Kim, Sanghyun Lee, Sung‐Kwon Moon, Yeon-Ju Kim, Hoon Kim
Multi-Modal Locomotion Mode Recognition in the Real World for Robotic Hip Complex Exoskeletons
Hyesoo Shin, Sangdo Kim, Sunwoo Kim, Jongwon Lee, Jinkyu Kim, KangGeon Kim
IF 5.3
IEEE Robotics and Automation Letters
Lower limb exoskeletons assist users by supporting joint movements. Since joint motion patterns vary depending on how the user moves, accurately recognizing the type of movement (locomotion mode) is crucial for controlling the exoskeleton and ensuring user safety. Inspired by how humans use multiple types of sensory information to control movement, we developed a multi-modal locomotion mode recognition (LMR) system that uses both mechanical and visual sensor data to identify locomotion modes. Our approach utilizes two fusion methods: intermediate fusion, which combines the data in the form of features, and late fusion, which integrates the sensor data by averaging the recognition results from each sensor. By fusing these two different modalities, the prediction accuracy improved by an average of 11.7% with the test data. Through comparisons with uni-modal LMR systems that rely on a single type of sensor data for locomotion mode recognition, we found that the improved performance of the multi-modal LMR system is due to the visual information's ability to generalize different gait patterns across users and the mechanical sensor data's consistency within the same classes.
How Do Young Entrepreneurs Grow into Social Entrepreneurs? - Focusing on Learning Growth Cases -
Jinkyu Kim, J. Daniel Kim
Journal of Public Society
본 연구는 청년기업가가 사회적기업가로 성장한 과정을 살펴보기 위해 그들이 사회적기업으로 성장한 맥락과 성장과정을 탐색했다. 이를 위해 사회적기업을 운영하는 청년 사회적기업가 대표 3명을 연구대상자로 선정하여 심층인터뷰를 진행했다. 청년이 사회적기업가로 변화한 맥락은 취약계층 경험 기반 사회적 책임의 필요성 인식, 사회적기업을 활용한 기업브랜딩과 성장 전략, 기업가 네트워크 경험을 통한 사회적 기업 학습으로 나타났다. 또한 청년의 사회적기업가 학습성장은 기업 확대를 위한 돌파구 찾기, 성장의 동력으로써 충성고객 네트워크 구축, 사회적기업 육성사업 참여를 통한 사회적기업가로의 성장임을 발견했다. 청년은 지속 가능한 기업의 생존을 위한 전략으로 지역에서 사회적 기업으로 변화를 시도했다. 본 연구는 그동안 사회적기업에 관한 연구에서 다루지 않았던 청년 사회적기업가의 성장과정을 규명했다는 의의가 있다.
3D Occupancy Prediction with Low-Resolution Queries via Prototype-aware View Transformation
Gyeongrok Oh, Sungjune Kim, H. S. Ko, Hyung‐gun Chi, Jinkyu Kim, Dongwook Lee, Daehyun Ji, Sungjoon Choi, Sujin Jang, Sangpil Kim
The resolution of voxel queries significantly influences the quality of view transformation in camera-based 3D occupancy prediction. However, computational constraints and the practical necessity for real-time deployment require smaller query resolutions, which inevitably leads to an information loss. Therefore, it is essential to encode and preserve rich visual details within limited query sizes while ensuring a comprehensive representation of 3D occupancy. To this end, we introduce ProtoOcc, a novel occupancy network that leverages prototypes of clustered image segments in view transformation to enhance low-resolution context. In particular, the mapping of 2D prototypes onto 3D voxel queries encodes high-level visual geometries and complements the loss of spatial information from reduced query resolutions. Additionally, we design a multi-perspective decoding strategy to efficiently disentangle the densely compressed visual cues into a high-dimensional 3D occupancy scene. Experimental results on both Occ3D and SemanticKITTI benchmarks demonstrate the effectiveness of the proposed method, showing clear improvements over the baselines. More importantly, ProtoOcc achieves competitive performance against the baselines even with 75% reduced voxel resolution. Project page: https://kuai-lab.github.io/cvpr2025protoocc.
Stream and Query-guided Feature Aggregation for Efficient and Effective 3D Occupancy Prediction
Seokha Moon, Janghyun Baek, G.H. Kim, Jinkyu Kim, SeonKyung Choi
ArXiv.org
3D occupancy prediction has become a key perception task in autonomous driving, as it enables comprehensive scene understanding. Recent methods enhance this understanding by incorporating spatiotemporal information through multi-frame fusion, but they suffer from a trade-off: dense voxel-based representations provide high accuracy at significant computational cost, whereas sparse representations improve efficiency but lose spatial detail. To mitigate this trade-off, we introduce DuOcc, which employs a dual aggregation strategy that retains dense voxel representations to preserve spatial fidelity while maintaining high efficiency. DuOcc consists of two key components: (i) Stream-based Voxel Aggregation, which recurrently accumulates voxel features over time and refines them to suppress warping-induced distortions, preserving a clear separation between occupied and free space. (ii) Query-guided Aggregation, which complements the limitations of voxel accumulation by selectively injecting instance-level query features into the voxel regions occupied by dynamic objects. Experiments on the widely used Occ3D-nuScenes and SurroundOcc datasets demonstrate that DuOcc achieves state-of-the-art performance in real-time settings, while reducing memory usage by over 40% compared to prior methods.