논문 | 박만식 교수 연구실 | 성신여자대학교 수리통계데이터사이언스학부

|박만식 교수 연구실

홈

연구 영역

기본 정보

논문·특허

과제

구성원

논문

연구 성과 추이

표시된 성과는 수집된 데이터 기준으로 산출되며, 일부 차이가 있을 수 있습니다.

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

Article

인용수 1

2024

Variable selection and prediction performance of penalized two-part regression with community-based crime data application

Seong‐Tae Kim, Man Sik Park

IF 0.6 (2024)

Communications for Statistical Applications and Methods

반연속(semi-continuous) 자료는 0에서의 점확률 질량과 양의 값에 대한 연속 분포가 혼합된 형태로 특징지어진다. 이러한 유형의 자료는 흔히 2-부분(two-part) 모형으로 모델링되며, 첫 번째 부분은 이분형 결과(0 또는 양의 값)의 확률을 모형화하고, 두 번째 부분은 양의 값의 분포를 모형화한다. 2-부분 모형의 인기가 있음에도 불구하고, 특히 고차원 자료에서 이 모형에 대한 변수 선택은 충분히 다루어지지 않았다. 본 연구의 목적은 2-부분 모형에서 패널화된 회귀(penalized regression) 방법의 변수 선택 및 예측 성능을 조사하는 것이다. 시뮬레이션 연구를 통해 2-부분 모형에서 선택된 기법들의 성능을 평가하였다. 본 연구의 결과는 LASSO와 ENET이 SCAD와 MCP보다 더 많은 예측변수를 모형에 선택하는 경향이 있음을 보여준다. 그 결과, β-특이성(β-specificity)에서는 MCP와 SCAD가 LASSO와 ENET보다 우수하였고, 평균제곱오차(mean squared error) 측면에서는 LASSO와 ENET이 MCP와 SCAD보다 더 나은 성능을 보였다. 지역사회 기반 자료를 이용하여 범죄 발생 건수를 예측할 때 패널화된 회귀 방법을 적용한 경우에도 유사한 결과를 확인하였다.

https://doi.org/10.29220/csam.2024.31.4.441

Lasso (programming language)

Feature selection

Scad

Statistics

Mathematics

Regression analysis

Elastic net regularization

Regression

Model selection

Logistic regression

Article

인용수 0

2024

Spatial Neighborhood Order Determination for Gaussian Markov Random Fields

The Korean Data Analysis Society, Jennifer A. Hoeting, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2024.26.6.1671

Random field

Statistical physics

Markov chain

Gaussian

Gaussian random field

Order (exchange)

Mathematics

Computer science

Gaussian process

Statistics

Article

인용수 0

2021

Clustering County-wise COVID-19 Dynamics in North Carolina, USA

Seong‐Tae Kim, Man Sik Park

The Korean Data Analysis Society

COVID-19 팬데믹은 미국에서 전례 없는 영향과 함께 막대한 수의 확진자와 사망자를 초래하였다. 본 연구의 목적은 COVID-19 데이터를 이용하여 노스캐롤라이나의 카운티들 사이에 숨은 군집이 존재하는지 확인하는 것이다. 개별 주에서는 COVID-19 팬데믹에 대처하기 위한 자체 정책을 시행하므로, 본 연구는 단일 주인 노스캐롤라이나로 한정하였다. 우리는 두 가지 군집화 기법인 동적 시간 왜곡(dynamic time warping)과 딥러닝 오토인코더(deep learning autoencoder)를 통합하였다. 본 연구는 Johns Hopkins University Center for Systems Science and Engineering의 COVID-19 Visual Dashboard를 위한 데이터 저장소인 GitHub COVID-19 Data Set의 노스캐롤라이나 카운티별 COVID-19 데이터를 사용하였다. 이 저장소에서 2020년 3월 3일부터 2020년 9월 19일까지의 COVID-19 일별 확진자 수와 사망자 수를 선택하였다. 이러한 군집화 기법들은 COVID-19 감염과 치명률(fatality) 자료 모두에서 세 개의 대도시권을 나머지 지역과 구분하는 유사한 계층적 군집을 확인하였으며, 이는 인구 규모 및 노인 인구 비율과 같은 인구통계학적 변수들과 유의하게 상관된다. 본 연구의 결과는 COVID-19 유행에서 인구 밀도와 연결성의 중요성을 시사한다.

https://doi.org/10.37727/jkdas.2021.23.6.2535

Pandemic

Geography

Coronavirus disease 2019 (COVID-19)

Cluster analysis

Population

Demography

Cluster (spacecraft)

Cartography

Computer science

Medicine

Article

인용수 6

2020

Analysis of the Railway Accident-Related Damages in South Korea

Man Sik Park, Jin Ki Eom, Jungsoon Choi, Tae‐Young Heo

IF 2.679 (2020)

Applied Sciences

철도 사고는 대규모 대중교통 시스템으로 인해 사고 1건당 부상자와 사망자가 다수 발생하는 양상을 특징으로 하는 중대한 문제이다. 본 연구는 영(0)이 과잉된 포아송 회귀모형(일명 ZIP 모형)과 영(0)이 과잉된 음이항 회귀모형(ZINB 모형)을 양의 정(非負)수 계수 측정치에 적용하고, 영(0)이 과잉된 감마 회귀모형(ZIG 모형)과 영(0)이 과잉된 로그정규 회귀모형(ZILN 모형)을 반연속형 측정치에 적용하는 이원(두 부분) 모형(two-part models, TPMs)을 통해 철도 사고로 인한 피해를 평가하기 위한 새로운 접근법을 제안한다. 이 모형들은 2008년부터 2016년까지의 기간 동안, 열차 지연 시간, 지연된 열차 수, 사고 건수 응답을 고려하는 비용과 같은 사고 피해를 고려하여 한국철도에서 발생한 철도 사고를 평가하는 데 사용되었다. 산출된 결과로부터, 인적 요인, 고속철도 시스템 또는 Korea Train Express(KTX), 그리고 사상자 수가 주요 비용 증가 요인임을 확인하였다. 지연된 열차 수와 지연 시간의 규모는 비용이 발생할 확률과 비용의 규모 모두를 증가시키는 경향이 있다. 보다 나은 평가를 위해서는 철도 사고 데이터에 영(0)의 반복 발생이 적고 정확한 정보가 포함되어야 한다.

https://doi.org/10.3390/app10248769

Poisson regression

Negative binomial distribution

Damages

Train

Regression analysis

Accident (philosophy)

Transport engineering

Statistics

Poisson distribution

Traffic accident

Article

인용수 0

2020

주기도의 상관성을 이용한 시계열자료의 군집분석

Suhyun Kwon, Man Sik Park

http://dspace.kci.go.kr/handle/kci/1444228?show=full

Computer science

전체 논문

Article

인용수 1

2024

Variable selection and prediction performance of penalized two-part regression with community-based crime data application

Seong‐Tae Kim, Man Sik Park

IF 0.6 (2024)

Communications for Statistical Applications and Methods

https://doi.org/10.29220/csam.2024.31.4.441

Lasso (programming language)

Feature selection

Scad

Statistics

Mathematics

Regression analysis

Elastic net regularization

Regression

Model selection

Logistic regression

Article

인용수 0

2024

Spatial Neighborhood Order Determination for Gaussian Markov Random Fields

The Korean Data Analysis Society, Jennifer A. Hoeting, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2024.26.6.1671

Random field

Statistical physics

Markov chain

Gaussian

Gaussian random field

Order (exchange)

Mathematics

Computer science

Gaussian process

Statistics

Article

인용수 0

2021

Clustering County-wise COVID-19 Dynamics in North Carolina, USA

Seong‐Tae Kim, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2021.23.6.2535

Pandemic

Geography

Coronavirus disease 2019 (COVID-19)

Cluster analysis

Population

Demography

Cluster (spacecraft)

Cartography

Computer science

Medicine

Article

인용수 6

2020

Analysis of the Railway Accident-Related Damages in South Korea

Man Sik Park, Jin Ki Eom, Jungsoon Choi, Tae‐Young Heo

IF 2.679 (2020)

Applied Sciences

https://doi.org/10.3390/app10248769

Poisson regression

Negative binomial distribution

Damages

Train

Regression analysis

Accident (philosophy)

Transport engineering

Statistics

Poisson distribution

Traffic accident

Article

인용수 0

2020

주기도의 상관성을 이용한 시계열자료의 군집분석

Suhyun Kwon, Man Sik Park

http://dspace.kci.go.kr/handle/kci/1444228?show=full

Computer science

Article

인용수 0

2021

Measurement of the Amount of Credit Contagion Risk between Industries of Korea using EDF

Seong Hyuk Hong, Man Sik Park, Jae Bum Cho

The Korean Data Analysis Society

최근 금융리스크 중 신용리스크(credit risk) 측정대상 개체들 간에 전이되는 전이 위험량을 어떻게 측정할지에 대한 관심이 매우 높아지고 있는 반면에 측정방법에 대한 소개는 미미한 실정이다. 본 연구에서는 우선 전이효과 측정 시 많이 활용되는 전이효과지수(spillover index)를 이용하여 국내 대표적 내수산업과 수출산업 및 이들 관련 산업들을 대상으로 이들 업종 간의 신용리스크 전이효과를 예상부도확률(EDF) 데이터를 이용하여 분석하였다. 분석결과, 국내생산을 유발하는 유발효과가 큰 업종일수록 타 업종에 미치는 신용리스크의 전이효과가 큰 것으로 나타났다. 그러나 전이효과지수가 신용 전이 위험량 자체의 측정보다는 전이효과를 비중으로 측정하는 지표이므로, 선행연구들은 대부분 전이효과의 비중을 측정하는 데 중점을 두었다. 반면, 본 연구에서는 신용 개체들 간의 신용리스크 전이량의 측정방법을 새로이 제안하고자, 업종 간 VAR모형에 의한 예측PD와 업종별 ARMA모형에 의한 예측PD의 차이를 이용하여 특정산업이 타 업종으로부터 받는 신용 전이 위험량을 측정 및 분해하는 방안을 새롭게 제안하였다. 또한 선행연구들이 대부분 국가 간 등 거시적 측면에서 전이효과를 분석하였다면, 본 연구는 미시적 측면에서 신용 개체들 간에 신용 전이 위험량을 어떻게 측정할 지에 초점을 맞추었다. 이를 통해 향후 개별 금융기관들이 보유한 신용 보유자산에 대해, 관심 있는 신용 개체 간의 신용리스크 전이효과를 반영한 적정 경제적 자본관리 등에 도움이 될 것으로 기대한다.

https://doi.org/10.37727/jkdas.2021.23.5.2105

Spillover effect

Index (typography)

Credit risk

Econometrics

Business

Economics

Actuarial science

Computer science

Macroeconomics

Article

인용수 0

2020

Clustering County-Wise COVID-19 Dynamics in North Carolina

Man Sik Park, Seong‐Tae Kim

코로나19 팬데믹은 미국에서 전례 없는 영향과 함께 엄청난 수의 확진자와 사망자를 초래하였다. 본 연구는 코로나19 데이터를 이용하여 노스캐롤라이나의 군(county)들 사이에 존재하는 숨은 군집을 식별하고자 한다. 코로나19 팬데믹에 대응하기 위해 각 주에서는 자체적인 정책을 시행하므로, 본 연구는 단일 주인 노스캐롤라이나에 한정된다. 본 연구에서는 동적 시간 워핑(dynamic time warping)과 딥러닝 오토인코더(deep learning autoeconder)라는 두 가지 군집화 기법을 통합하였다. 이러한 군집화 기법은 군 단위 코로나19 데이터에서 세 개의 대도시 권역과 그 밖의 지역을 구분하는 상위 계층 군집들을 식별하였으며, 하위 군집은 약간 다른 양상을 보였다. 본 연구의 결과는 군 단위 코로나19 역학의 이해를 한층 더하고, 그 함의에 대한 통찰을 제공한다.

https://doi.org/10.1109/csci51800.2020.00157

Cluster analysis

Coronavirus disease 2019 (COVID-19)

Pandemic

Hierarchical clustering

Geography

Computer science

Metropolitan area

Dynamic time warping

Cartography

Data mining

Article

인용수 41

2020

Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data

Bomi Jeong, Hyunjeong Cho, Jieun Kim, Soon Kil Kwon, Seungwoo Hong, Seungwoo Hong, ChangSik Lee, TaeYeon Kim, Man Sik Park, Seoksu Hong, Seoksu Hong, Tae‐Young Heo

IF 3.706 (2020)

Diagnostics

본 연구는 극도로 불균형한 신장 데이터에서 통계 모형의 분류 성능을 비교하는 것을 목적으로 한다. 한국의 국민건강보험공단이 제공하는 건강검진 코호트 데이터베이스를 이용하여 다양한 기계학습 방법으로 모형을 구축한다. 사구체여과율(glomerular filtration rate, GFR)은 만성콩팥병(chronic kidney disease, CKD)을 진단하는 데 사용되며, Modification of Diet in Renal Disease 방법을 사용하여 산출하고 5단계(1, 2, 3A 및 3B, 4, 5)로 분류한다. 추정 GFR에 기반한 서로 다른 CKD 단계는 반응 변수를 6개 범주로 정의한다. 본 연구는 분류를 위해 다항 로지스틱 회귀(multinomial logistic regression, multinomial LR)와 순서형 로지스틱 회귀(ordinal logistic regression, ordinal LR)라는 두 가지 대표적 일반화 선형모형을 사용하고, 랜덤 포레스트(random forest, RF)와 오토인코더(autoencoder, AE)라는 두 가지 기계학습 모형도 사용한다. 4가지 모형의 분류 성능을 정확도, 민감도, 특이도, 정밀도, F1-측정값(F1-Measure) 측면에서 비교한다. CKD 단계를 정확히 분류하는 최적 모형을 찾기 위해, 각 CKD 단계의 비율을 동일하게 한 10-fold 데이터셋으로 데이터를 분할한다. 결과는 반응 변수를 분류할 때, 다항 및 순서형 LR 모형보다 RF와 AE가 정확도에서 더 나은 성능을 보였음을 나타낸다. 그러나 매우 불균형한 데이터셋을 모형화하면 모형 성능의 정확도가 실제 성능을 왜곡할 수 있다. 이는 소수 범주를 다수 범주로 분류하더라도 정확도가 높게 나타날 수 있기 때문이다. 이러한 문제를 성능 해석에서 해결하기 위해, 우리는 혼동행렬로부터의 정확도뿐 아니라 각 범주별 민감도, 특이도, 정밀도, 그리고 F-1 측정값을 함께 고려한다. 각 모형에 대해 단일한 값으로 분류 성능을 제시하기 위해, 각 모형의 거시 평균(macro-average)과 미시 가중(micro-weighted) 값을 계산한다. 결론적으로, AE는 모든 성능 지표에서 CKD 단계를 정확히 분류하는 최적의 모형이다.

https://doi.org/10.3390/diagnostics10060415

Multinomial logistic regression

Random forest

Computer science

Artificial intelligence

Logistic regression

Confusion matrix

Machine learning

Autoencoder

Statistics

Data mining

Article

인용수 107

2020

Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods

Yuna Shin, Taekgeun Kim, Seoksu Hong, Seoksu Hong, Seulbi Lee, Eunji Lee, Seungwoo Hong, Seungwoo Hong, ChangSik Lee, TaeYeon Kim, Man Sik Park, Jungsu Park, Tae‐Young Heo

IF 3.103 (2020)

Water

다수의 연구에서 다중 회귀 모형을 이용하여 엽록소-a 농도를 예측하고, 홀드아웃 기법으로 이를 검증해 왔다. 본 연구에서는 한국 낙동강의 엽록소-a 농도를 예측하기 위해 지지벡터회귀(Support Vector Regression), 배깅(Bagging), 랜덤 포레스트(Random Forest), 극한 그래디언트 부스팅(Extreme Gradient Boosting, XGBoost), 순환신경망(Recurrent Neural Network, RNN), 장단기기억(Long–Short-Term Memory, LSTM)과 같은 널리 사용되는 기계학습 모형을 활용하여 새로운 예측 모형을 구축하였다. 본 연구에서는 시계열 자료의 특성을 반영하기 위해 1-step ahead 재귀적 예측(1–step ahead recursive prediction)을 사용하였다. 예측 정확도를 향상시키기 위해 모형 구축은 순방향 변수 선택(forward variable selection)에 기반하여 수행하였다. 적합된 모형들은 홀드아웃 기법 대신 누적 학습(cumulative learning) 및 롤링 윈도우 학습(rolling window learning) 방법을 통해 검증하였다. RNN 모형과 롤링 윈도우 학습 방법을 결합하여 엽록소-a 농도를 예측할 때 가장 우수한 결과가 얻어졌다. 이러한 결과는 기계학습 모형에서 설명변수의 선택과 1-step ahead 재귀적 예측이 예측 성능을 향상시키는 데 중요한 과정임을 시사한다.

https://doi.org/10.3390/w12061822

Computer science

Random forest

Machine learning

Recurrent neural network

Artificial intelligence

Artificial neural network

Support vector machine

Regression

Gradient boosting

Regression analysis

Article

인용수 39

2019

Exploring the catchment area of an urban railway station by using transit card data: Case study in Seoul

Jin Ki Eom, Jungsoon Choi, Man Sik Park, Tae‐Young Heo

IF 4.802 (2019)

Cities

https://doi.org/10.1016/j.cities.2019.05.033

Catchment area

Smart card

Transport engineering

Transfer station

Transfer (computing)

TRIPS architecture

Public transport

Transit (satellite)

Urban area

Land use

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

Article

인용수 1

2024

Variable selection and prediction performance of penalized two-part regression with community-based crime data application

Seong‐Tae Kim, Man Sik Park

IF 0.6 (2024)

Communications for Statistical Applications and Methods

https://doi.org/10.29220/csam.2024.31.4.441

Lasso (programming language)

Feature selection

Scad

Statistics

Mathematics

Regression analysis

Elastic net regularization

Regression

Model selection

Logistic regression

Article

인용수 0

2024

Spatial Neighborhood Order Determination for Gaussian Markov Random Fields

The Korean Data Analysis Society, Jennifer A. Hoeting, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2024.26.6.1671

Random field

Statistical physics

Markov chain

Gaussian

Gaussian random field

Order (exchange)

Mathematics

Computer science

Gaussian process

Statistics

Article

인용수 0

2021

Clustering County-wise COVID-19 Dynamics in North Carolina, USA

Seong‐Tae Kim, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2021.23.6.2535

Pandemic

Geography

Coronavirus disease 2019 (COVID-19)

Cluster analysis

Population

Demography

Cluster (spacecraft)

Cartography

Computer science

Medicine

Article

인용수 6

2020

Analysis of the Railway Accident-Related Damages in South Korea

Man Sik Park, Jin Ki Eom, Jungsoon Choi, Tae‐Young Heo

IF 2.679 (2020)

Applied Sciences

https://doi.org/10.3390/app10248769

Poisson regression

Negative binomial distribution

Damages

Train

Regression analysis

Accident (philosophy)

Transport engineering

Statistics

Poisson distribution

Traffic accident

Article

인용수 0

2020

주기도의 상관성을 이용한 시계열자료의 군집분석

Suhyun Kwon, Man Sik Park

http://dspace.kci.go.kr/handle/kci/1444228?show=full

Computer science

전체 논문

Article

인용수 1

2024

Variable selection and prediction performance of penalized two-part regression with community-based crime data application

Seong‐Tae Kim, Man Sik Park

IF 0.6 (2024)

Communications for Statistical Applications and Methods

https://doi.org/10.29220/csam.2024.31.4.441

Lasso (programming language)

Feature selection

Scad

Statistics

Mathematics

Regression analysis

Elastic net regularization

Regression

Model selection

Logistic regression

Article

인용수 0

2024

Spatial Neighborhood Order Determination for Gaussian Markov Random Fields

The Korean Data Analysis Society, Jennifer A. Hoeting, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2024.26.6.1671

Random field

Statistical physics

Markov chain

Gaussian

Gaussian random field

Order (exchange)

Mathematics

Computer science

Gaussian process

Statistics

Article

인용수 0

2021

Clustering County-wise COVID-19 Dynamics in North Carolina, USA

Seong‐Tae Kim, Man Sik Park

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2021.23.6.2535

Pandemic

Geography

Coronavirus disease 2019 (COVID-19)

Cluster analysis

Population

Demography

Cluster (spacecraft)

Cartography

Computer science

Medicine

Article

인용수 6

2020

Analysis of the Railway Accident-Related Damages in South Korea

Man Sik Park, Jin Ki Eom, Jungsoon Choi, Tae‐Young Heo

IF 2.679 (2020)

Applied Sciences

https://doi.org/10.3390/app10248769

Poisson regression

Negative binomial distribution

Damages

Train

Regression analysis

Accident (philosophy)

Transport engineering

Statistics

Poisson distribution

Traffic accident

Article

인용수 0

2020

주기도의 상관성을 이용한 시계열자료의 군집분석

Suhyun Kwon, Man Sik Park

http://dspace.kci.go.kr/handle/kci/1444228?show=full

Computer science

Article

인용수 0

2021

Measurement of the Amount of Credit Contagion Risk between Industries of Korea using EDF

Seong Hyuk Hong, Man Sik Park, Jae Bum Cho

The Korean Data Analysis Society

https://doi.org/10.37727/jkdas.2021.23.5.2105

Spillover effect

Index (typography)

Credit risk

Econometrics

Business

Economics

Actuarial science

Computer science

Macroeconomics

Article

인용수 0

2020

Clustering County-Wise COVID-19 Dynamics in North Carolina

Man Sik Park, Seong‐Tae Kim

https://doi.org/10.1109/csci51800.2020.00157

Cluster analysis

Coronavirus disease 2019 (COVID-19)

Pandemic

Hierarchical clustering

Geography

Computer science

Metropolitan area

Dynamic time warping

Cartography

Data mining

Article

인용수 41

2020

Comparison between Statistical Models and Machine Learning Methods on Classification for Highly Imbalanced Multiclass Kidney Data

Bomi Jeong, Hyunjeong Cho, Jieun Kim, Soon Kil Kwon, Seungwoo Hong, Seungwoo Hong, ChangSik Lee, TaeYeon Kim, Man Sik Park, Seoksu Hong, Seoksu Hong, Tae‐Young Heo

IF 3.706 (2020)

Diagnostics

https://doi.org/10.3390/diagnostics10060415

Multinomial logistic regression

Random forest

Computer science

Artificial intelligence

Logistic regression

Confusion matrix

Machine learning

Autoencoder

Statistics

Data mining

Article

인용수 107

2020

Prediction of Chlorophyll-a Concentrations in the Nakdong River Using Machine Learning Methods

Yuna Shin, Taekgeun Kim, Seoksu Hong, Seoksu Hong, Seulbi Lee, Eunji Lee, Seungwoo Hong, Seungwoo Hong, ChangSik Lee, TaeYeon Kim, Man Sik Park, Jungsu Park, Tae‐Young Heo

IF 3.103 (2020)

Water

https://doi.org/10.3390/w12061822

Computer science

Random forest

Machine learning

Recurrent neural network

Artificial intelligence

Artificial neural network

Support vector machine

Regression

Gradient boosting

Regression analysis

Article

인용수 39

2019

Exploring the catchment area of an urban railway station by using transit card data: Case study in Seoul

Jin Ki Eom, Jungsoon Choi, Man Sik Park, Tae‐Young Heo

IF 4.802 (2019)

Cities

https://doi.org/10.1016/j.cities.2019.05.033

Catchment area

Smart card

Transport engineering

Transfer station

Transfer (computing)

TRIPS architecture

Public transport

Transit (satellite)

Urban area

Land use