논문 | 문경식 교수 연구실 | 고려대학교 컴퓨터학과

문경식 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

논문

구성원

논문

연구 성과 추이

표시된 성과는 수집된 데이터 기준으로 산출되며, 일부 차이가 있을 수 있습니다.

5개년 연도별 논문 게재 수

37총합

5개년 연도별 피인용 수

673총합

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

article

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

ArXiv.org

Accurately recovering hand poses within the body context remains a major challenge in 3D whole-body pose estimation. This difficulty arises from a fundamental supervision gap: whole-body pose estimators are trained on full-body datasets with limited hand diversity, while hand-only estimators, trained on hand-centric datasets, excel at detailed finger articulation but lack global body awareness. To address this, we propose Hand4Whole++, a modular framework that leverages the strengths of both pre-trained whole-body and hand pose estimators. We introduce CHAM (Conditional Hands Modulator), a lightweight module that modulates the whole-body feature stream using hand-specific features extracted from a pre-trained hand pose estimator. This modulation enables the whole-body model to predict wrist orientations that are both accurate and coherent with the upper-body kinematic structure, without retraining the full-body model. In parallel, we directly incorporate finger articulations and hand shapes predicted by the hand pose estimator, aligning them to the full-body mesh via differentiable rigid alignment. This design allows Hand4Whole++ to combine globally consistent body reasoning with fine-grained hand detail. Extensive experiments demonstrate that Hand4Whole++ substantially improves hand accuracy and enhances overall full-body pose quality.

http://arxiv.org/abs/2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

Existing single-image 3D human avatar methods primarily rely on rigid joint transformations, limiting their ability to model realistic cloth dynamics. We present DynaAvatar, a zero-shot framework that reconstructs animatable 3D human avatars with motion-dependent cloth dynamics from a single image. Trained on large-scale multi-person motion datasets, DynaAvatar employs a Transformer-based feed-forward architecture that directly predicts dynamic 3D Gaussian deformations without subject-specific optimization. To overcome the scarcity of dynamic captures, we introduce a static-to-dynamic knowledge transfer strategy: a Transformer pretrained on large-scale static captures provides strong geometric and appearance priors, which are efficiently adapted to motion-dependent deformations through lightweight LoRA fine-tuning on dynamic captures. We further propose the DynaFlow loss, an optical flow-guided objective that provides reliable motion-direction geometric cues for cloth dynamics in rendered space. Finally, we reannotate the missing or noisy SMPL-X fittings in existing dynamic capture datasets, as most public dynamic capture datasets contain incomplete or unreliable fittings that are unsuitable for training high-quality 3D avatar reconstruction models. Experiments demonstrate that DynaAvatar produces visually rich and generalizable animations, outperforming prior methods.

https://doi.org/10.48550/arxiv.2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

article

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

http://arxiv.org/abs/2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

전체 논문

article

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

ArXiv.org

http://arxiv.org/abs/2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

https://doi.org/10.48550/arxiv.2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

article

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

http://arxiv.org/abs/2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

preprint

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

arXiv (Cornell University)

https://doi.org/10.48550/arxiv.2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2025

PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image

Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon, Kyoung Mu Lee

ArXiv.org

The misaligned human texture across different human parts is one of the main limitations of existing 3D human reconstruction methods. Each human part, such as a jacket or pants, should maintain a distinct texture without blending into others. The structural coherence of human parts serves as a crucial cue to infer human textures in the invisible regions of a single image. However, most existing 3D human reconstruction methods do not explicitly exploit such part segmentation priors, leading to misaligned textures in their reconstructions. In this regard, we present PARTE, which utilizes 3D human part information as a key guide to reconstruct 3D human textures. Our framework comprises two core components. First, to infer 3D human part information from a single image, we propose a 3D part segmentation module (PartSegmenter) that initially reconstructs a textureless human surface and predicts human part labels based on the textureless surface. Second, to incorporate part information into texture reconstruction, we introduce a part-guided texturing module (PartTexturer), which acquires prior knowledge from a pre-trained image generation network on texture alignment of human parts. Extensive experiments demonstrate that our framework achieves state-of-the-art quality in 3D human reconstruction. The project page is available at https://hygenie1228.github.io/PARTE/.

http://arxiv.org/abs/2507.17332

Segmentation

Texture (cosmology)

3D reconstruction

Image texture

Key (lock)

Image segmentation

Image (mathematics)

Texture mapping

Coherence (philosophical gambling strategy)

preprint

green

인용수 0

2024

URHand: Universal Relightable Hands

Zhaoxi Chen, Gyeongsik Moon, Kaiwen Guo, Chen Cao, Stanislav Pidhorskyi, Tomas Simon, Rohan Joshi, Yuan Dong, Yichen Xu, Bernardo R. Pires, He Wen, Lucas Evans, Bo Peng, Julia Buffalini, Autumn Trimble, Kevyn McPhail, Melissa Schoeller, Shoou-I Yu, Javier Romero, Michael Zollhöfer, Yaser Sheikh, Ziwei Liu, Shunsuke Saito

arXiv (Cornell University)

Existing photorealistic relightable hand models require extensive identity-specific observations in different views, poses, and illuminations, and face challenges in generalizing to natural illuminations and novel identities. To bridge this gap, we present URHand, the first universal relightable hand model that generalizes across viewpoints, poses, illuminations, and identities. Our model allows few-shot personalization using images captured with a mobile phone, and is ready to be photorealistically rendered under novel illuminations. To simplify the personalization process while retaining photorealism, we build a powerful universal relightable prior based on neural relighting from multi-view images of hands captured in a light stage with hundreds of identities. The key challenge is scaling the cross-identity training while maintaining personalized fidelity and sharp details without compromising generalization under natural illuminations. To this end, we propose a spatially varying linear lighting model as the neural renderer that takes physics-inspired shading as input feature. By removing non-linear activations and bias, our specifically designed lighting model explicitly keeps the linearity of light transport. This enables single-stage training from light-stage data while generalizing to real-time rendering under arbitrary continuous illuminations across diverse identities. In addition, we introduce the joint learning of a physically based model and our neural relighting model, which further improves fidelity and generalization. Extensive experiments show that our approach achieves superior performance over existing methods in terms of both quality and generalizability. We also demonstrate quick personalization of URHand from a short phone scan of an unseen identity.

http://arxiv.org/abs/2401.05334

Computer science

Rendering (computer graphics)

Artificial intelligence

Fidelity

Personalization

Computer vision

article

인용수 5

2024

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, X. Weipeng, Rohan Joshi, W. Chenglei, Takaaki Shiratori

The authentic 3D hand avatar with every identifiable information, such as hand shapes and textures, is necessary for immersive experiences in AR/VR. In this paper, we present a universal hand model (UHM), which 1) can universally represent high-fidelity 3D hand meshes of arbitrary identities (IDs) and 2) can be adapted to each person with a short phone scan for the authentic hand avatar. For effective universal hand modeling, we perform tracking and modeling at the same time, while previous 3D hand models perform them separately. The conventional separate pipeline suffers from the accumulated errors from the tracking stage, which cannot be recovered in the modeling stage. On the other hand, ours does not suffer from the accumulated errors while having a much more concise overall pipeline. We additionally introduce a novel image matching loss function to address a skin sliding during the tracking and modeling, while existing works have not focused on it much. Finally, using learned priors from our UHM, we effectively adapt our UHM to each person's short phone scan for the authentic hand avatar.

https://doi.org/10.1109/cvpr52733.2024.00198

Avatar

Computer science

Phone

Human–computer interaction

Linguistics

article

인용수 0

2024

Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars

Stuart Anderson, Matt Andromalos, Scott Ardisson, Timur Bagautdinov, Shaojie Bai, James F. Booth, Wyatt Borsos, Daniel A. Braun, Aljona Brewer, Julia Buffalini, Shen-Chi Chen, Yueh-Tung Chen, Matthew Cipperly, Thomas Dauer, Yuan Dong, Tao Du, Mohamed Elshaer, Youssef Emad, Lucas Evans, Natalia Fadeeva, Lon Farr, Sidi Fu, Tim Godisart, Adina Greene, Kaiwen Guo, Peihong Guo, Divam Gupta, Kayla Haidle, Bethanie L. Hansen, Christopher Heilman, Jih-Shih Hsu, Ben Humberston, Andrew Jewett, Rohan Joshi, Anjani Josyula, Kai Kang, Emily Kim, Taylor Koska, Steven Krenn, LI Cheng-hui, Steven Longay, Silvio Maeta, Julieta Martinez, Kevyn McPhail, Gyeongsik Moon, Bo Peng, Bernardo R. Pires, Fabián Prada, Javier Romero, Shunsuke Saito, Jason Saragih, Melissa Schoeller, Gabriel Schwartz, Peter Selednik, Hitesh Shah, Yaser Sheikh, Takaaki Shiratori, Tomas Simon, Matthew Stewart, Paul Theodosis, Autumn Trimble, Simon Venshtain, Te-Li Wang, Shih-En Wei, He Wen, Longhua Wu, Michael Wu, Yichen Xu, Shoou-I Yu, Michael Zollhöfer

https://doi.org/10.52202/079017-2640

Avatar

Codec

Noise (video)

Encoding (memory)

preprint

green

인용수 0

2024

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori

arXiv (Cornell University)

http://arxiv.org/abs/2405.07933

Avatar

Phone

Computer science

Invisible hand

Smart phone

Human–computer interaction

Computer graphics (images)

Telecommunications

Linguistics

Economics

book-chapter

인용수 3

2024

3D Hand Sequence Recovery from Real Blurry Images and Event Stream

Joonkyu Park, Gyeongsik Moon, Weipeng Xu, Evan Kaseman, Takaaki Shiratori, Kyoung Mu Lee

Lecture notes in computer science

https://doi.org/10.1007/978-3-031-73202-7_20

Computer science

Sequence (biology)

Event (particle physics)

Computer vision

Artificial intelligence

Computer graphics (images)

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

article

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

ArXiv.org

http://arxiv.org/abs/2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

https://doi.org/10.48550/arxiv.2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

article

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

http://arxiv.org/abs/2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

전체 논문

article

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

ArXiv.org

http://arxiv.org/abs/2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

https://doi.org/10.48550/arxiv.2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

article

green

인용수 0

2026

Zero-Shot Reconstruction of Animatable 3D Avatars with Cloth Dynamics from a Single Image

Joohyun Kwon, Geonhee Sim, Gyeongsik Moon

arXiv (Cornell University)

http://arxiv.org/abs/2603.14772

Avatar

Dynamics (music)

Limiting

Motion capture

Human motion

Gaussian

Joint (building)

USable

preprint

green

인용수 0

2026

Enhancing Hands in 3D Whole-Body Pose Estimation with Conditional Hands Modulator

Gyeongsik Moon

arXiv (Cornell University)

https://doi.org/10.48550/arxiv.2603.14726

Pose

Articulated body pose estimation

Modular design

Context (archaeology)

Kinematics

3D pose estimation

Feature (linguistics)

Estimator

preprint

green

인용수 0

2025

PARTE: Part-Guided Texturing for 3D Human Reconstruction from a Single Image

Hyeongjin Nam, Donghwan Kim, Gyeongsik Moon, Kyoung Mu Lee

ArXiv.org

http://arxiv.org/abs/2507.17332

Segmentation

Texture (cosmology)

3D reconstruction

Image texture

Key (lock)

Image segmentation

Image (mathematics)

Texture mapping

Coherence (philosophical gambling strategy)

preprint

green

인용수 0

2024

URHand: Universal Relightable Hands

arXiv (Cornell University)

http://arxiv.org/abs/2401.05334

Computer science

Rendering (computer graphics)

Artificial intelligence

Fidelity

Personalization

Computer vision

article

인용수 5

2024

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, X. Weipeng, Rohan Joshi, W. Chenglei, Takaaki Shiratori

https://doi.org/10.1109/cvpr52733.2024.00198

Avatar

Computer science

Phone

Human–computer interaction

Linguistics

article

인용수 0

2024

Codec Avatar Studio: Paired Human Captures for Complete, Driveable, and Generalizable Avatars

https://doi.org/10.52202/079017-2640

Avatar

Codec

Noise (video)

Encoding (memory)

preprint

green

인용수 0

2024

Authentic Hand Avatar from a Phone Scan via Universal Hand Model

Gyeongsik Moon, Weipeng Xu, Rohan Joshi, Chenglei Wu, Takaaki Shiratori

arXiv (Cornell University)

http://arxiv.org/abs/2405.07933

Avatar

Phone

Computer science

Invisible hand

Smart phone

Human–computer interaction

Computer graphics (images)

Telecommunications

Linguistics

Economics

book-chapter

인용수 3

2024

3D Hand Sequence Recovery from Real Blurry Images and Event Stream

Joonkyu Park, Gyeongsik Moon, Weipeng Xu, Evan Kaseman, Takaaki Shiratori, Kyoung Mu Lee

Lecture notes in computer science

https://doi.org/10.1007/978-3-031-73202-7_20

Computer science

Sequence (biology)

Event (particle physics)

Computer vision

Artificial intelligence

Computer graphics (images)