Lightweight Hand Gesture Recognition Using FMCW RADAR With Multibranch Temporal Convolutional Networks and Channel Attention
Taeyoung Kim, Yunho Jung, Seongjoo Lee
IF 4.5
IEEE Sensors Journal
A novel lightweight hand gesture recognition approach that is based on Frequency-Modulated Continuous Wave (FMCW) RADAR, which aims to minimize computational complexity and memory usage as well as maintain a high recognition performance, is proposed in this paper. Most of the existing methods utilize two-dimensional (2D) or three-dimensional (3D) features that are combined with complex neural network structures, which result in high computational costs. The proposed approach in contrast extracts four components, which include range, Doppler, azimuth, and elevation, as the one-dimensional (1D) time-series features. These features are fed into a neural network that comprises of a multi-branch temporal convolutional network, depthwise separable convolutions, and a channel attention mechanism in order to enhance the classification performance. The experiments were conducted with 9 hand gestures that were collected from 9 participants. The proposed system achieved a high accuracy of 99.38% with only 44.6K parameters and 1.84M FLOPs. Extensive ablation studies and comparative experiments against the existing models demonstrated that the proposed method effectively balances the performance and computational efficiency. This study validates the expressive capability of 1D features for hand gesture recognition and suggests practical applicability in resource-constrained environments, such as embedded systems.
Microphone Pair Training for Robust Sound Source Localization With Diverse Array Configurations
Inkyu An, Guoyuan An, Taeyoung Kim, Sung‐Eui Yoon
IF 5.3
IEEE Robotics and Automation Letters
We present a novel sound source localization method that leverages microphone pair training, designed to deliver robust performance in various real-world environments. Existing deep learning (DL)-based approaches face scalability issues when dealing with various types of microphone arrays. To address these issues, our approach has been structured into two training steps: the first step focuses on microphone pair training, while the second step is designed for array geometry-aware training. The first training step enables our model to learn from multiple datasets covering various real-world situations, allowing it to robustly estimate the time difference of arrival (TDoA). Our robust-TDoA model incorporates a Mel scale learnable filter bank (MLFB) and a hierarchical frequency-to-time attention network (HiFTA-net). This allows it to effectively learn from various situations in multiple datasets, including those involving simultaneous sources and various sound events. The second training step enables our approach to estimate the direction of arrival (DoA) of sound based on TDoA information computed by our robust-TDoA model, which begins with parameters acquired during the first training step. During this process, our approach can be trained to accommodate geometry information of the target microphone array, which can span diverse array types. As a result, our method demonstrates robust performance across two DoA estimation tasks using three different types of arrays.
Lightweight Hand Gesture Recognition Using FMCW RADAR With Multibranch Temporal Convolutional Networks and Channel Attention
Taeyoung Kim, Yunho Jung, Seongjoo Lee
IF 4.5
IEEE Sensors Journal
A novel lightweight hand gesture recognition approach that is based on Frequency-Modulated Continuous Wave (FMCW) RADAR, which aims to minimize computational complexity and memory usage as well as maintain a high recognition performance, is proposed in this paper. Most of the existing methods utilize two-dimensional (2D) or three-dimensional (3D) features that are combined with complex neural network structures, which result in high computational costs. The proposed approach in contrast extracts four components, which include range, Doppler, azimuth, and elevation, as the one-dimensional (1D) time-series features. These features are fed into a neural network that comprises of a multi-branch temporal convolutional network, depthwise separable convolutions, and a channel attention mechanism in order to enhance the classification performance. The experiments were conducted with 9 hand gestures that were collected from 9 participants. The proposed system achieved a high accuracy of 99.38% with only 44.6K parameters and 1.84M FLOPs. Extensive ablation studies and comparative experiments against the existing models demonstrated that the proposed method effectively balances the performance and computational efficiency. This study validates the expressive capability of 1D features for hand gesture recognition and suggests practical applicability in resource-constrained environments, such as embedded systems.
Microphone Pair Training for Robust Sound Source Localization With Diverse Array Configurations
Inkyu An, Guoyuan An, Taeyoung Kim, Sung‐Eui Yoon
IF 5.3
IEEE Robotics and Automation Letters
We present a novel sound source localization method that leverages microphone pair training, designed to deliver robust performance in various real-world environments. Existing deep learning (DL)-based approaches face scalability issues when dealing with various types of microphone arrays. To address these issues, our approach has been structured into two training steps: the first step focuses on microphone pair training, while the second step is designed for array geometry-aware training. The first training step enables our model to learn from multiple datasets covering various real-world situations, allowing it to robustly estimate the time difference of arrival (TDoA). Our robust-TDoA model incorporates a Mel scale learnable filter bank (MLFB) and a hierarchical frequency-to-time attention network (HiFTA-net). This allows it to effectively learn from various situations in multiple datasets, including those involving simultaneous sources and various sound events. The second training step enables our approach to estimate the direction of arrival (DoA) of sound based on TDoA information computed by our robust-TDoA model, which begins with parameters acquired during the first training step. During this process, our approach can be trained to accommodate geometry information of the target microphone array, which can span diverse array types. As a result, our method demonstrates robust performance across two DoA estimation tasks using three different types of arrays.
Intelligent Exercise and Feedback System for Social Healthcare using LLMOps
Yeongrak Choi, Taeyoung Kim, Hae-Ra Han
ArXiv.org
This study addresses the growing demand for personalized feedback in healthcare platforms and social communities by introducing an LLMOps-based system for automated exercise analysis and personalized recommendations. Current healthcare platforms rely heavily on manual analysis and generic health advice, limiting user engagement and health promotion effectiveness. We developed a system that leverages Large Language Models (LLM) to automatically analyze user activity data from the "Ounwan" exercise recording community. The system integrates LLMOps with LLM APIs, containerized infrastructure, and CI/CD practices to efficiently process large-scale user activity data, identify patterns, and generate personalized recommendations. The architecture ensures scalability, reliability, and security for large-scale healthcare communities. Evaluation results demonstrate the system's effectiveness in three key metrics: exercise classification, duration prediction, and caloric expenditure estimation. This approach improves the efficiency of community management while providing more accurate and personalized feedback to users, addressing the limitations of traditional manual analysis methods.
Recent advancements in generative AI technology have made it possible to generate art pieces for traditional Korean painting as well. Visual image generation through generative AI can occur autonomously as well as through co-creation with creators. This technological progress has opened up possibilities for utilizing generative AI in art therapy, aiming to achieve psychological well-being during the creation or appreciation of artworks or improving the therapeutic process. However, the measurement of psychological improvement resulting from the collaborative creation process using generative AI has not been attempted thus far. Therefore, the purpose of this study is to experimentally ascertain whether the creation of visual images through generative AI leads to psychological improvement for creators or viewers. 40 participants experienced the generative AI-based traditional Korean painting experience web developed by ALLBIGDAT Inc, a generative AI company in Korea, and conducted a survey on the psychological state of each before and after the experience. As a result, it was found that imperfections and serendipity had a significant effect on enhancing satisfaction with collaborative creations. In addition, the age considered as a control factor also showed statistically significant results, indicating that the higher the age, the higher user's satisfaction.
Cluster-Based Sampling in Hindsight Experience Replay for Robotic Tasks (Student Abstract)
Taeyoung Kim, Dongsoo Har
Proceedings of the AAAI Conference on Artificial Intelligence
In multi-goal reinforcement learning with a sparse binary reward, training agents is particularly challenging, due to a lack of successful experiences. To solve this problem, hindsight experience replay (HER) generates successful experiences even from unsuccessful ones. However, generating successful experiences from uniformly sampled ones is not an efficient process. In this paper, the impact of exploiting the property of achieved goals in generating successful experiences is investigated and a novel cluster-based sampling strategy is proposed. The proposed sampling strategy groups episodes with different achieved goals by using a cluster model and samples experiences in the manner of HER to create the training batch. The proposed method is validated by experiments with three robotic control tasks of the OpenAI Gym. The results of experiments demonstrate that the proposed method is substantially sample efficient and achieves better performance than baseline approaches.
Kick-motion Training with DQN in AI Soccer Environment
Bumgeun Park, Jihui Lee, Taeyoung Kim, Dongsoo Har
This paper presents a technique to train a robot to perform kick-motion in AI soccer by using reinforcement learning (RL). In RL, an agent interacts with an environment and learns to choose an action in a state at each step. When training RL algorithms, a problem called the curse of dimensionality (COD) can occur if the dimension of the state is high and the number of training data is low. The COD often causes degraded performance of RL models. In the situation of the robot kicking the ball, as the ball approaches the robot, the robot chooses the action based on the information obtained from the soccer field. In order not to suffer COD, the training data, which are experiences in the case of RL, should be collected evenly from all areas of the soccer field over (theoretically infinite) time. In this paper, we attempt to use the relative coordinate system (RCS) as the state for training kick-motion of robot agent, instead of using the absolute coordinate system (ACS). Using the RCS eliminates the necessity for the agent to know all the (state) information of entire soccer field and reduces the dimension of the state that the agent needs to know to perform kick-motion, and consequently alleviates COD. The training based on the RCS is performed with the widely used Deep Q-network (DQN) and tested in the AI Soccer environment implemented with Webots simulation software.