나재호 교수 연구실 | 상명대학교 SW융합학부

나재호 연구실

상명대학교 SW융합학부

나재호 교수

나재호 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

홈

나재호 연구실

상명대학교 SW융합학부 나재호 교수

본 연구실은 컴퓨터그래픽스를 중심으로 모바일 및 임베디드 환경에서의 실시간 레이 트레이싱 하드웨어, 텍스처 압축과 CPU-GPU 하이브리드 최적화, VR·모바일 GPU를 위한 렌더링 성능 향상 기술을 연구하며, 고품질 그래픽 효과를 제한된 전력·메모리·연산 자원 안에서 효율적으로 구현하는 알고리즘과 시스템 아키텍처를 개발하고 있다.

대표 연구 분야

연구 영역 전체보기

모바일 실시간 레이 트레이싱 하드웨어 아키텍처

텍스처 압축 및 CPU-GPU 하이브리드 최적화

모바일·VR 환경을 위한 실시간 렌더링 최적화

주요 논문

논문 전체보기

article

bronze

인용수 8

2020

QuickETC2

Jae‐Ho Nah

IF 9.5

ACM Transactions on Graphics

Compressed textures are indispensable in most 3D graphics applications to reduce memory traffic and increase performance. For higher-quality graphics, the number and size of textures in an application have continuously increased. Additionally, the ETC2 texture format, which is mandatory in OpenGL ES 3.0, OpenGL 4.3, and Android 4.3 (and later versions), requires more complex texture compression than the traditional ETC1 format. As a result, texture compression becomes more and more time-consuming. To accelerate ETC2 compression, we introduce two new compression techniques, named QuickETC2. The first technique is an early compression-mode decision scheme. Instead of testing all ETC1/2 modes to compress a texel block, we select proper modes for each block by exploiting the luma difference of the block to reduce unnecessary compression overhead. The second technique is a fast luma-based T- and H-mode compression method. When clustering each texel into two groups, we replace the 3D RGB space with the 1D luma space and quickly find the two groups that have the minimum luma differences. We also selectively perform the T- or H-mode and reduce its distance candidates, according to the luma differences of each group. We have implemented both techniques with AVX2 intrinsics to exploit SIMD parallelism. According to our experiments, QuickETC2 can compress more than 2000 1K×1K-sized images per second on an octa-core CPU.

https://doi.org/10.1145/3414685.3417787

Computer science

Intrinsics

Texel

Texture compression

Data compression

SIMD

Computer graphics (images)

Block (permutation group theory)

OpenGL

Artificial intelligence

article

인용수 6

2017

Z <sup>2</sup> traversal order: An interleaving approach for VR stereo rendering on tile-based GPUs

Jae‐Ho Nah, Yeongkyu Lim, Sunho Ki, Chulho Shin

IF 18.3

Computational Visual Media

With increasing demands of virtual reality (VR) applications, efficient VR rendering techniques are becoming essential. Because VR stereo rendering has increased computational costs to separately render views for the left and right eyes, to reduce the rendering cost in VR applications, we present a novel traversal order for tile-based mobile GPU architectures: Z2 traversal order. In tile-based mobile GPU architectures, a tile traversal order that maximizes spatial locality can increase GPU cache efficiency. For VR applications, our approach improves upon the traditional Z order curve. We render corresponding screen tiles in left and right views in turn, or simultaneously, and as a result, we can exploit spatial adjacency of the two tiles. To evaluate our approach, we conducted a trace-driven hardware simulation using Mesa and a hardware simulator. Our experimental results show that Z2 traversal order can reduce external memory bandwidth requirements and increase rendering performance.

https://doi.org/10.1007/s41095-017-0093-5

Computer graphics (images)

Computer science

Tile

Rendering (computer graphics)

Tree traversal

Interleaving

Parallel rendering

Software rendering

Real-time rendering

Computer graphics

article

인용수 23

2014

HART: A Hybrid Architecture for Ray Tracing Animated Scenes

Jae‐Ho Nah, Jinwoo Kim, Junho Park, Won‐Jong Lee, Jeong‐Soo Park, Seokyoon Jung, Woo-Chan Park, Dinesh Manocha, Tack‐Don Han

IF 6.5

IEEE Transactions on Visualization and Computer Graphics

We present a hybrid architecture, inspired by asynchronous BVH construction [1], for ray tracing animated scenes. Our hybrid architecture utilizes heterogeneous hardware resources: dedicated ray-tracing hardware for BVH updates and ray traversal and a CPU for BVH reconstruction. We also present a traversal scheme using a primitive's axis-aligned bounding box (PrimAABB). This scheme reduces ray-primitive intersection tests by reusing existing BVH traversal units and the primAABB data for tree updates; it enables the use of shallow trees to reduce tree build times, tree sizes, and bus bandwidth requirements. Furthermore, we present a cache scheme that exploits consecutive memory access by reusing data in an L1 cache block. We perform cycle-accurate simulations to verify our architecture, and the simulation results indicate that the proposed architecture can achieve real-time Whitted ray tracing animated scenes at 1,920 × 1,200 resolution. This result comes from our high-performance hardware architecture and minimized resource requirements for tree updates.

https://doi.org/10.1109/tvcg.2014.2371855

Computer science

Tree traversal

Ray tracing (physics)

Cache

Parallel computing

Data structure

Tree (set theory)

Computer architecture

Computer hardware

Computer graphics (images)

정부 과제

과제 전체보기

주관|

2021년 8월-2024년 2월

|31,380,000원

CPU-GPU 하이브리드 텍스쳐 압축 기법 연구

* 본 과제의 아이디어는 CPU 프로그래밍과 GPU 프로그래밍의 특성이 다르다는 점에서 착안 - CPU는 분기 패널티가 없고, SSE/AVX2 등의 명령어를 통해 직접 SIMD 유닛에 연산을 매핑할 수 있기 때문에, 알고리즘 구현에 있어서 자유도가 높음. 다만 일반 데스크탑 환경(8코어 이하)에서의 절대적인 연산 성능은 최신 GPU 대비 낮음. - GPU는 SIMT (single instruction, multiple threads) 형태로 같은 명령어가 여러 쓰레드에 걸쳐서 동시에 수행되는 형태이기 때문에, 효율적인 알고리즘 구현에 있어서 제약이 많음. 하지만 높은 연산량을 요구하는 고품질 텍스쳐 인코딩의 경우, GPU의 높은 연산 성능을 제대로 활용 가능. * 이러한 점을 이용하여 CPU와 GPU와 구조에 맞게 텍스쳐 인코딩 작업을 분리 - 1차 CPU 처리: 고속 압축 알고리즘을 적용하여 인코딩. - 2차 GPU 처리: 1차 처리에서 눈에 띄는 결점(artifacts)을 보이는 problematic block에 대해서만 GPU로 전송한 후, GPU의 병렬 셰이더 코어를 이용하여 이들을 고품질로 재압축. - 이 두 작업은 파이프라인화되도록 설계, CPU와 GPU가 가능한 한 동시에 작동되도록 함 - 더블 버퍼링 (double buffering) 방식으로 블록이 저장되는 메모리 공간을 활용. 이를 통해, n-1개의 쓰레드가 병렬 고속 압축을 수행하는 동안, 1개의 쓰레드는 GPU와의 커뮤니케이션을 수행하고 최종 이미지를 조합하는 역할을 수행. * 하이브리드 CPU/GPU 인코딩 알고리즘을 세부적으로 구현하기 위해 다음과 같은 사항들에 대한 연구개발이 필요하며, 다차년도에 걸쳐 이를 수행할 예정 - State-of-the-art CPU/GPU 인코딩 알고리즘 분석 및 실험 환경/체계 구축 (1차년도) - Problematic block에 대해 분리 수행이 가능한 CPU 기반 고속 인코더 개발 (2차년도) - GPU의 연산 능력을 효과적으로 활용 가능한 GPU 기반 고속 인코더 개발 (2차년도) - 파이프라인화된 고속 CPU/GPU 하이브리드 인코더 개발 (3차년도)

텍스쳐 압축

텍스쳐 매핑

CPU-GPU 하이브리드 컴퓨팅

ETC2

주관|

2021년 8월-2024년 2월

|31,380,000원

CPU-GPU 하이브리드 텍스쳐 압축 기법 연구

* 이러한 점을 이용하여 CPU와 GPU와 구조에 맞게 텍스쳐 인코딩 작업을 분리 - 1차 CPU 처리: 고속 압축 알고리즘을 적용하여 인코딩. - 2차 GPU 처리: 1차 처리에서 눈에 띄는 결점(artifacts)을 보이는 problematic block에 대해서만 GPU로 전송한 후, GPU의 병렬 셰이더 코어를 이용하여 이들을 고품질로 재압축. - 이 두 작업은 파이프라인화되도록 설계, CPU와 GPU가 가능한 한 동시에 작동되도록 함 - 더블 버퍼링 (double buffering) 방식으로 블록이 저장되는 메모리 공간을 활용. 이를 통해, n-1개의 쓰레드가 병렬 고속 압축을 수행하는 동안, 1개의 쓰레드는 GPU와의 커뮤니케이션을 수행하고 최종 이미지를 조합하는 역할을 수행. * 하이브리드 CPU/GPU 인코딩 알고리즘을 세부적으로 구현하기 위해 다음과 같은 사항들에 대한 연구개발이 필요하며, 다차년도에 걸쳐 이를 수행할 예정 - State-of-the-art CPU/GPU 인코딩 알고리즘 분석 및 실험 환경/체계 구축 (1차년도) - Problematic block에 대해 분리 수행이 가능한 CPU 기반 고속 인코더 개발 (2차년도) - GPU의 연산 능력을 효과적으로 활용 가능한 GPU 기반 고속 인코더 개발 (2차년도) - 파이프라인화된 고속 CPU/GPU 하이브리드 인코더 개발 (3차년도)

텍스쳐 압축

텍스쳐 매핑

CPU-GPU 하이브리드 컴퓨팅

ETC2

주관|

2012년 8월-2013년 8월

|33,000,000원

그래픽스 및 사운드 렌더링을 위한 고속 광선 추적 알고리즘 및 하드웨어 구조

본 프로젝트는 그래픽스와 사운드가 실제처럼 보이고 들리게 만드는 핵심 기술인 광선 추적법을 더 빠르고 효율적으로 구현하기 위한 연구임. 특히 세계적 연구 환경에서 새로운 렌더링 기술과 국제적 연구 역량을 함께 확장하는 연수 기반 연구임. 연구 목표는 그래픽스와 사운드 렌더링에 활용되는 실시간 광선 추적 기술을 발전시키고 국제적 연구 경험을 확보하는 데 있음. 핵심 연구 내용은 트리 탐색 가속 알고리즘 개선, LOD 기반 광선 추적 기법 개발, 그래픽스와 사운드를 동시에 가속하는 하드웨어 구조 설계임. 기대 효과는 게임·영상 제작·VR/AR·고성능 컴퓨팅 분야에서 현실감 향상, 제작 효율 증대, 실시간 시뮬레이션 성능 향상 등을 제공함.

광선 추적법

사운드 렌더링

3차원 그래픽스

최신 특허

특허 전체보기

상태	출원연도	과제명	출원번호
등록	2018	모바일 디바이스 및 그 제어 방법	1020180094598
취하	2016	이미지 처리장치, 및 이를 구비하는 영상표시장치	1020160020830
등록	2015	디지털 디바이스 및 상기 디지털 디바이스에서 데이터 처리 방법	1020150171515

전체 특허

모바일 디바이스 및 그 제어 방법

상태

등록

출원연도

2018

출원번호

1020180094598

상세 정보 바로가기

이미지 처리장치, 및 이를 구비하는 영상표시장치

상태

취하

출원연도

2016

출원번호

1020160020830

상세 정보 바로가기

디지털 디바이스 및 상기 디지털 디바이스에서 데이터 처리 방법

상태

등록

출원연도

2015

출원번호

1020150171515

상세 정보 바로가기