Tutorial on Matching-based Causal Analysis of Human Behaviors Using Smartphone Sensor Data
Gyuwon Jung, Sangjun Park, Eun-Yeol Ma, Heeyoung Kim, Uichin Lee
IF 28
ACM Computing Surveys
Smartphones can unobtrusively capture human behavior and contextual data such as user interaction and mobility. Thus far, smartphone sensor data have primarily been used to gain behavioral insights through correlation analysis. This article provides a tutorial on the causal analysis of human behavior using smartphone sensor data by reviewing well-known matching methods. The key steps of the causal inference pipeline employing matching methods are illustrated using a concrete scenario involving the identification of a causal relationship between phone usage and physical activity. Several practical considerations for conducting causal inferences about human behaviors using smartphone sensor data are also discussed.
Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions
Uichin Lee, Gyuwon Jung, Eun-Yeol Ma, Jin San Kim, Heepyung Kim, Jumabek Alikhanov, Youngtae Noh, Heeyoung Kim
IF 19.2
IEEE/CAA Journal of Automatica Sinica
With the advent of digital therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy, a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis. In this work, the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets, to investigate contextual patterns associated with DTx usage, and to establish the (causal) relationship between DTx engagement and behavioral adherence. This review of the key components of data-driven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets, which helps to iteratively improve the receptivity of existing DTx.
Simultaneous Deep Clustering and Feature Selection via K-Concrete Autoencoder
Woojin Doo, Heeyoung Kim
IF 10.4
IEEE Transactions on Knowledge and Data Engineering
Existing deep learning methods for clustering high-dimensional data perform feature selection and clustering separately, which can result in the exclusion of some important features for clustering. In this paper, we propose a method that performs deep clustering and feature selection simultaneously by inserting a concrete selector layer between the input layer and the first encoder layer of a modified autoencoder. The concrete selector layer performs feature selection, while the modified autoencoder performs clustering in the latent space by incorporating K-means loss and inter-cluster distances. The proposed method, called the K-concrete autoencoder, selects features important for clustering and uses only the selected features to learn K-means-friendly latent representations of the data. Moreover, we propose an extension of the K-concrete autoencoder to provide relative importance of each selected feature. We demonstrate the effectiveness of the proposed method using simulated and real datasets.
Tutorial on Matching-based Causal Analysis of Human Behaviors Using Smartphone Sensor Data
Gyuwon Jung, Sangjun Park, Eun-Yeol Ma, Heeyoung Kim, Uichin Lee
IF 28
ACM Computing Surveys
Smartphones can unobtrusively capture human behavior and contextual data such as user interaction and mobility. Thus far, smartphone sensor data have primarily been used to gain behavioral insights through correlation analysis. This article provides a tutorial on the causal analysis of human behavior using smartphone sensor data by reviewing well-known matching methods. The key steps of the causal inference pipeline employing matching methods are illustrated using a concrete scenario involving the identification of a causal relationship between phone usage and physical activity. Several practical considerations for conducting causal inferences about human behaviors using smartphone sensor data are also discussed.
Toward Data-Driven Digital Therapeutics Analytics: Literature Review and Research Directions
Uichin Lee, Gyuwon Jung, Eun-Yeol Ma, Jin San Kim, Heepyung Kim, Jumabek Alikhanov, Youngtae Noh, Heeyoung Kim
IF 19.2
IEEE/CAA Journal of Automatica Sinica
With the advent of digital therapeutics (DTx), the development of software as a medical device (SaMD) for mobile and wearable devices has gained significant attention in recent years. Existing DTx evaluations, such as randomized clinical trials, mostly focus on verifying the effectiveness of DTx products. To acquire a deeper understanding of DTx engagement and behavioral adherence, beyond efficacy, a large amount of contextual and interaction data from mobile and wearable devices during field deployment would be required for analysis. In this work, the overall flow of the data-driven DTx analytics is reviewed to help researchers and practitioners to explore DTx datasets, to investigate contextual patterns associated with DTx usage, and to establish the (causal) relationship between DTx engagement and behavioral adherence. This review of the key components of data-driven analytics provides novel research directions in the analysis of mobile sensor and interaction datasets, which helps to iteratively improve the receptivity of existing DTx.
Simultaneous Deep Clustering and Feature Selection via K-Concrete Autoencoder
Woojin Doo, Heeyoung Kim
IF 10.4
IEEE Transactions on Knowledge and Data Engineering
Existing deep learning methods for clustering high-dimensional data perform feature selection and clustering separately, which can result in the exclusion of some important features for clustering. In this paper, we propose a method that performs deep clustering and feature selection simultaneously by inserting a concrete selector layer between the input layer and the first encoder layer of a modified autoencoder. The concrete selector layer performs feature selection, while the modified autoencoder performs clustering in the latent space by incorporating K-means loss and inter-cluster distances. The proposed method, called the K-concrete autoencoder, selects features important for clustering and uses only the selected features to learn K-means-friendly latent representations of the data. Moreover, we propose an extension of the K-concrete autoencoder to provide relative importance of each selected feature. We demonstrate the effectiveness of the proposed method using simulated and real datasets.
Uncertainty Estimation by Flexible Evidential Deep Learning
Tae Sung Yoon, Heeyoung Kim
arXiv (Cornell University)
Uncertainty quantification (UQ) is crucial for deploying machine learning models in high-stakes applications, where overconfident predictions can lead to serious consequences. An effective UQ method must balance computational efficiency with the ability to generalize across diverse scenarios. Evidential deep learning (EDL) achieves efficiency by modeling uncertainty through the prediction of a Dirichlet distribution over class probabilities. However, the restrictive assumption of Dirichlet-distributed class probabilities limits EDL's robustness, particularly in complex or unforeseen situations. To address this, we propose \textit{flexible evidential deep learning} (F-EDL), which extends EDL by predicting a flexible Dirichlet distribution -- a generalization of the Dirichlet distribution -- over class probabilities. This approach provides a more expressive and adaptive representation of uncertainty, significantly enhancing UQ generalization and reliability under challenging scenarios. We theoretically establish several advantages of F-EDL and empirically demonstrate its state-of-the-art UQ performance across diverse evaluation settings, including classical, long-tailed, and noisy in-distribution scenarios.
An Exploratory Study of Risk Perception about Electric Kick-Scooter Using Product Categorization
경상국립대학교 경영학부 부교수, Yong Wan Park, Heeyoung Kim
Journal of Marketing Management Research
전동킥보드는 이용이 간편하고 속도가 빨라 급격하게 확산되고 있지만, 오토바이 수준의 위험성을 가지고 있다는 인식은 거의 없다. 이에 본 연구는 전동킥보드에 대한 위험도를 지각시킬 수 있는 방법 을 제시하고자 하였으며, 구체적으로 전동킥보드의 제품 범주화 방법(스쿠터 vs. 자전거)과 판매회사의 유형(자동차 회사 vs. 모빌리티 회사)이 전동킥보드에 대한 지각에 차별적인 영향을 미치는지 살펴보았 다. 실험 결과, 전동킥보드가 스쿠터보다는 자전거로 제시되었을 때 제품품질이 더 높다고 평가하였으 며, 전동킥보드의 판매회사가 모빌리티 회사일때보다는 자동차 회사일 때 제품품질이 더 높다고 평가하 였다. 상호작용 효과를 살펴보면, 자동차 회사에서 판매하는 전동킥보드의 경우에는 제품범주는 제품의 위험도 지각에 영향을 미치지 않았으나, 모빌리티 회사에서 판매하는 경우에는 제품을 자전거로 제시하 는 경우보다 스쿠터로 제시하는 경우에 더 위험하다고 평가하였다. 이러한 결과를 바탕으로 전동킥보드 와 관련한 시사점을 논의하였다.
Deep Latent Factor Model for Spatio-Temporal Forecasting
Wonmo Koo, Eun-Yeol Ma, Heeyoung Kim
IF 2.5
Technometrics
Latent factor models can perform spatio-temporal forecasting (i.e., predicting future responses at unmeasured as well as measured locations) by modeling temporal dependence using latent factors and considering spatial dependence using a spatial prior on factor loadings. However, they may fail to capture complex spatio-temporal dependence because the latent factors are typically assumed to follow a classical linear time series model, such as a vector autoregressive model. In this article, we propose a deep latent factor model for spatio-temporal forecasting that can model complex spatio-temporal dependence more flexibly by leveraging the high expressive power of a deep neural network. Specifically, the latent factors are modeled using a recurrent neural network and the factor loadings are modeled using a distance-based Gaussian process. The proposed model allows the number of latent factors to be inferred from the data using a beta-Bernoulli process, which enables computationally more efficient implementation compared to previous methods. We derive a stochastic variational inference algorithm for scalable inference of the proposed model and validate the model using simulated and real data examples.
Few-Shot Classification of Wafer Bin Maps Using Transfer Learning and Ensemble Learning
Hyeonwoo Kim, Heegeon Yoon, Heeyoung Kim
IF 2.9
Journal of Manufacturing Science and Engineering
Abstract The high cost of collecting and annotating wafer bin maps (WBMs) necessitates few-shot WBM classification, i.e., classifying WBM defect patterns using a limited number of WBMs. Existing few-shot WBM classification algorithms mainly utilize meta-learning methods that leverage knowledge learned in several episodes. However, meta-learning methods require a large amount of additional real WBMs, which can be unrealistic. To help train a network with a few real-WBMs while avoiding this challenge, we propose the use of simulated WBMs to pre-train a classification model. Specifically, we employ transfer learning by pre-training a classification network with sufficient amounts of simulated WBMs and then fine-tuning it with a few real-WBMs. We further employ ensemble learning to overcome the overfitting problem in transfer learning by fine-tuning multiple sets of classification layers of the network. A series of experiments on a real-dataset demonstrate that our model outperforms the meta-learning methods that are widely used in few-shot WBM classification. Additionally, we empirically verify that transfer and ensemble learning, the two most important yet simple components of our model, reduce the prediction bias and variance in few-shot scenarios without a significant increase in training time.
Uncertainty Estimation by Density Aware Evidential Deep Learning
Tae Sung Yoon, Heeyoung Kim
arXiv (Cornell University)
Evidential deep learning (EDL) has shown remarkable success in uncertainty estimation. However, there is still room for improvement, particularly in out-of-distribution (OOD) detection and classification tasks. The limited OOD detection performance of EDL arises from its inability to reflect the distance between the testing example and training data when quantifying uncertainty, while its limited classification performance stems from its parameterization of the concentration parameters. To address these limitations, we propose a novel method called Density Aware Evidential Deep Learning (DAEDL). DAEDL integrates the feature space density of the testing example with the output of EDL during the prediction stage, while using a novel parameterization that resolves the issues in the conventional parameterization. We prove that DAEDL enjoys a number of favorable theoretical properties. DAEDL demonstrates state-of-the-art performance across diverse downstream tasks related to uncertainty estimation and classification