논문 | 김선욱 교수 연구실 | 고려대학교 전기전자공학부

|김선욱 교수 연구실

홈

연구 영역

기본 정보

논문·특허

과제

구성원

논문

연구 성과 추이

표시된 성과는 수집된 데이터 기준으로 산출되며, 일부 차이가 있을 수 있습니다.

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

Article

인용수 7

2023

PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA

Won Jun Lee, Chang Hyun Kim, Yoonah Paik, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

처리-메모리(Processing-in-memory, PIM)는 특히 메모리 집약적인 DNN 애플리케이션의 연산을 위해 메모리 대역폭 한계를 극복하기 위한 방안으로 주목받고 있다. 대부분의 PIM 접근법은 CPU의 메모리 요청을 활용하여 PIM 엔진에 명령과 피연산자를 전달하며, 그 결과 핵심 처리부가 바쁘게 점유되어 불필요한 데이터 전송이 발생하고, 이에 따라 상당한 오프로딩 오버헤드가 유발된다. DMA는 CPU의 개입 없이 메모리 계층을 오염시키지 않으면서 연속된 대량의 데이터를 전송함으로써 이러한 문제를 해결할 수 있으며, 이는 PIM 개념에 정확히 부합한다. 그러나 DRAM 기반 PIM 장치의 제한된 연산 자원 때문에 단일 DMA 트랜잭션에서 전송 가능한 데이터 양이 작고, 많은 수의 디스크립터(descriptor)가 필요하므로 여전히 상당한 오프로딩 오버헤드가 발생한다. 본 논문에서는 PIM 오퍼레이션 코드(opcode)와 피연산자를 단일 디스크립터로 표현하기 위해 PISA-DMA(PIM ISA using DMA descriptor called PISA-DMA)라는 DMA 디스크립터를 활용한 PIM 명령어 세트 아키텍처(ISA)를 제안한다. 제안한 ISA는 하나의 PIM 명령을 하나의 DMA 트랜잭션 완료로 간주하고, DMA 디스크립터 리스트를 사용하여 일련의 PIM 명령을 표현함으로써 PIM 프로그래밍을 직관적으로 만든다. 또한 PISA-DMA는 오프로딩 오버헤드를 최소화하면서 상용 플랫폼과의 호환성을 보장한다. PISA-DMA는 오퍼레이션 코드 오프로딩 오버헤드를 제거하고, ONNX runtime에서 실제 기계를 사용했을 때 시퀀스 길이 128에서 각각 BERT, RoBERTa, GPT-2 모델에 대해 기본 PIM 대비 1.25배, 1.31배, 1.29배의 속도 향상을 달성한다. 아울러 본 논문은 제안한 PISA가 컴파일러 최적화에서 성능에 미치는 영향을 연구하고, 행렬-행렬 곱(matrix-matrix multiplication)과 원소별 덧셈(element-wise addition)의 연산자 융합(operator fusion)을 통해 1.04배의 속도 향상을 보이며, 이는 기존 ISA에서도 유사한 성능 이득을 보여준다는 점을 확인한다.

https://doi.org/10.1109/access.2023.3238812

Computer science

Opcode

Operand

Parallel computing

Speedup

Compiler

Coprocessor

Instruction set

Computer architecture

Overhead (engineering)

Article

인용수 2

2023

BL-PIM: Varying the Burst Length to Realize the All-Bank Performance and Minimize the Multi-Workload Interference for in-DRAM PIM

Chang Hyun Kim, Won Jun Lee, Yoonah Paik, Seok Young Kim, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

트랜스포머 응용에 대한 수요가 급격히 증가함에 따라, 메모리 병목을 해결하기 위한 기술들이 주목받고 있다. 그중 하나가 DRAM 내부에서 연산을 수행하는 인-DRAM Processing-In-Memory(PIM) 아키텍처이다. 주요 DRAM 제조사들은 PIM 샘플을 도입하면서, 모든 뱅크의 연산을 동시에 수행하여 내부 DRAM 대역폭을 최대화함으로써 최고 성능을 달성하고자 한다. 그러나 상용 제품으로의 구현은 문제가 있는데, PIM 메모리에서 PIM 실행 중에는 모든 뱅크의 실행이 PIM이 아닌 애플리케이션과 동시에 수행되지 않으므로 메모리 공간이 분리되기 때문이다. 본 논문은 뱅크 내부에서 메모리 요청의 버스트 길이(BL)를 증가시켜 내부 대역폭을 최대화하고, 뱅크 간 연산을 중첩함으로써 모든 뱅크 성능을 달성하는 BL-PIM 아키텍처를 제안한다. 한편 뱅크 외부에서는 BL을 증가시키지 않는 것으로 보이므로, 메모리 계층에서 데이터 일관성을 보존하면서 PIM 메모리와 함께 PIM이 아닌 애플리케이션과 PIM 애플리케이션을 동시에 실행할 수 있다. 또한 더 큰 BL을 사용하는 메모리 집약적인 PIM 연산은 메모리 요청의 수를 크게 감소시켜, 다른 애플리케이션과의 성능 간섭을 최소화한다. 우리는 DRAM 타이밍 다이어그램을 면밀히 확장하고, 메모리 컨트롤러와 PIM 장치 간의 협력 메커니즘을 개발하였다. FPGA에서 BL-PIM 아키텍처를 구현하고, 네 가지 트랜스포머 모델과 여덟 개의 연산/메모리-대역폭 병목 SPEC 벤치마크를 사용하여 실제 기계에서의 성능과 비교하였다. 그 결과, BL-PIM은 트랜스포머 모델에서 CPU 단일 스레드 및 다중 스레드 실행 대비 최대 28.9배 및 12.0배 더 빠른 성능을 달성하였다. 또한 최대치로 버스트 길이를 16배 증가시켰을 때, BL-PIM은 이상적인 모든 뱅크 PIM 실행 대비 1.2배 더 빠르다. 아울러 SPEC 벤치마크를 사용한 다중 워크로드 실행을 실험하였으며, 본 아키텍처가 성능 간섭을 최소화할 수 있음을 보여주었다. 우리가 아는 한, PIM의 다중 워크로드 실행에 대한 연구는 공개된 범위에서 이번이 최초이다.

https://doi.org/10.1109/access.2023.3300893

Computer science

Memory controller

Registered memory

Interleaved memory

CAS latency

Dram

Embedded system

Parallel computing

Computer hardware

Memory management

Article

인용수 5

2022

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

Seok Young Kim, Seok Young Kim, Chang Hyun Kim, Won Joon Lee, Il Memming Park, Seon Wook Kim, Seon Wook Kim

IF 2.6 (2022)

Microprocessors and Microsystems

https://doi.org/10.1016/j.micpro.2022.104592

Lookup table

Computer science

Floating point

Activation function

Algorithm

Overhead (engineering)

Parallel computing

Artificial neural network

Artificial intelligence

Article

인용수 0

2022

Low-Cost Unified Pixel Converter from the MIPI DSI Packets into Arbitrary Pixel Sizes

Kiyong Kwon, Dong‐Won Kang, Geon-Woo Ko, Seok‐Young Kim, Seon Wook Kim, Seon-Wook Kim, Seon-Wook Kim

IF 2.9 (2022)

Electronics

반도체 및 영상 처리 기술의 발전은 특히 모바일 소비자 기기에서 시각적 품질을 크게 향상시켰다. 이러한 기기들은 고해상도 디스플레이에서 다양한 픽셀 포맷을 지원하기 위해 저비용이면서도 고대역폭의 인터페이스를 요구하며, 이에 따라 MIPI Alliance는 업계 표준인 MIPI DSI (Display Serial Interface)를 제안하였다. 기존의 DSI Rx 구현은 DSI PHY 입력 폭에 패킷을 정렬한 뒤 페이로드를 픽셀로 변환하는 방식으로, 수신되는 패킷을 세 가지 구성 요소인 헤더, 페이로드, 체크섬으로 분류한다. 이러한 2단계 접근은 다양한 픽셀을 지원하는 데 높은 구현 비용을 초래하였다. 본 논문은 저비용의 통합형 픽셀 컨버터를 제안하며, 각 구성 요소를 분류하고 입력 페이로드를 여러 픽셀 포맷에 단 한 단계로 정렬함으로써 면적 및 전력 소모 오버헤드를 줄인다. 제안에는 새롭게 두 가지 용어를 도입하였다: 베이스(base)와 나머지(remainder). 베이스 크기는 DSI PHY 입력과 동일하며, 나머지는 베이스들을 정렬한 뒤 남는 부분이다. 1픽셀의 크기는 하나 이상의 베이스와 나머지의 합과 같다. 이러한 도입은 베이스의 정확한 크기 및 D-PHY 입력 덕분에 컨버터를 매우 직관적으로 구현할 수 있게 해준다. 또한 제안하는 접근은 헤더를 페이로드와 별도로 고려할 필요가 없는데, 이는 헤더의 크기가 베이스 크기와 동일하기 때문이다. 따라서 헤더 검출 유닛을 제거할 수 있어 복잡성이 추가로 감소한다. 제안된 설계는 FPGA에서 기능적으로 검증되었고 Samsung 65 nm 표준 셀 라이브러리를 통해 합성되었다. 합성 결과, 제안된 설계는 기존 설계에 비해 면적은 25.7%, 전력 소모는 38.6% 감소하는 것으로 나타났다.

https://doi.org/10.3390/electronics11081221

Header

Payload (computing)

Pixel

Computer science

Network packet

Checksum

Computer hardware

Computer network

Artificial intelligence

Operating system

Article

인용수 10

2022

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

IF 3.9 (2022)

IEEE Access

처리-메모리(Processing-in-Memory, PIM)는 특히 낮은 국소성(locality)을 갖는 데이터 집약적 애플리케이션을 효율적으로 처리하기 위해, 연산 유닛을 메모리 근처 또는 메모리 내부에 배치함으로써 메모리 병목을 극복하고자 활발히 연구되어 왔다. 우리는 PIM 계산을 한 번의 DRAM 명령으로 몇 개의 뱅크가 수행하는지에 따라 in-DRAM PIM을 뱅크-당(per-bank) 및 전체-뱅크(all-bank)로 분류할 수 있다. 뱅크-당 PIM은 한 번의 PIM 실행에서 단 하나의 뱅크만 동작하여 낮은 성능을 제공하지만, 표준 DRAM 인터페이스를 보존하고 PIM 실행 중 비-PIM 요청을 처리한다. 반면 전체-뱅크 PIM은 모든 뱅크를 동작시켜 높은 성능을 달성하나, 열 및 전력 소모와 같은 설계상의 문제를 수반한다. 우리는 표준 JEDEC DRAM 인터페이스를 보존하면서 이상적인 전체-뱅크 PIM 성능을 달성하기 위해 메모리-연산 분리(decoupling) 실행을 제안하며, 즉 뱅크-당 실행을 수행하는 방식으로 상용 플랫폼에 쉽게 적응할 수 있다. PIM 실행은 메모리 단계와 연산 단계의 두 단계로 나눈다. 메모리 단계에서는 한 뱅크에서 뱅크 전용(bank-private) 피연산자를 읽어 PIM 엔진의 레지스터에 뱅크별로 저장한다. 연산 단계에서는 PIM 엔진을 메모리 어레이에서 분리하고, 표준 읽기/쓰기 명령을 사용하여 뱅크 공유(bank-shared) 피연산자를 브로드캐스트(broadcast)함으로써 모든 뱅크가 동시에 연산을 수행하도록 하여, 전체-뱅크 PIM의 연산 처리량을 달성한다. 연산 단계의 확장, 즉 전체-뱅크 실행 기회를 최대화하기 위해, 뱅크 전용 및 뱅크 공유 피연산자를 식별하기 위한 컴파일러 분석 및 코드 생성 기법을 도입한다. 우리는 이 분리된 PIM에서 Level-2/3 BLAS, 멀티 배치(multi-batch) LSTM 기반 Seq2Seq 모델, 그리고 BERT의 성능을 상용 컴퓨팅 플랫폼과 비교하였다. Level-3 BLAS에서 CPU, GPU, 및 뱅크-당 PIM 대비 각각 75.8배, 1.2배, 4.7배의 속도 향상을 달성했으며, 이상적인 전체-뱅크 PIM 성능의 최대 91.4%까지 도달하였다. 또한, 우리의 분리된 PIM은 GPU 및 뱅크-당 PIM보다 각각 72.0%, 78.4% 적은 전력을 소모했지만, 이상적인 전체-뱅크 PIM보다 7.4% 더 많았다.

https://doi.org/10.1109/access.2022.3203051

Dram

Computer science

Decoupling (probability)

Interface (matter)

Random access memory

Non-volatile random-access memory

CAS latency

Embedded system

Registered memory

Memory management

전체 논문

137

Article

인용수 7

2023

PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA

Won Jun Lee, Chang Hyun Kim, Yoonah Paik, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3238812

Computer science

Opcode

Operand

Parallel computing

Speedup

Compiler

Coprocessor

Instruction set

Computer architecture

Overhead (engineering)

Article

인용수 2

2023

BL-PIM: Varying the Burst Length to Realize the All-Bank Performance and Minimize the Multi-Workload Interference for in-DRAM PIM

Chang Hyun Kim, Won Jun Lee, Yoonah Paik, Seok Young Kim, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3300893

Computer science

Memory controller

Registered memory

Interleaved memory

CAS latency

Dram

Embedded system

Parallel computing

Computer hardware

Memory management

Article

인용수 5

2022

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

Seok Young Kim, Seok Young Kim, Chang Hyun Kim, Won Joon Lee, Il Memming Park, Seon Wook Kim, Seon Wook Kim

IF 2.6 (2022)

Microprocessors and Microsystems

https://doi.org/10.1016/j.micpro.2022.104592

Lookup table

Computer science

Floating point

Activation function

Algorithm

Overhead (engineering)

Parallel computing

Artificial neural network

Artificial intelligence

Article

인용수 0

2022

Low-Cost Unified Pixel Converter from the MIPI DSI Packets into Arbitrary Pixel Sizes

Kiyong Kwon, Dong‐Won Kang, Geon-Woo Ko, Seok‐Young Kim, Seon Wook Kim, Seon-Wook Kim, Seon-Wook Kim

IF 2.9 (2022)

Electronics

https://doi.org/10.3390/electronics11081221

Header

Payload (computing)

Pixel

Computer science

Network packet

Checksum

Computer hardware

Computer network

Artificial intelligence

Operating system

Article

인용수 10

2022

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

IF 3.9 (2022)

IEEE Access

https://doi.org/10.1109/access.2022.3203051

Dram

Computer science

Decoupling (probability)

Interface (matter)

Random access memory

Non-volatile random-access memory

CAS latency

Embedded system

Registered memory

Memory management

Article

인용수 0

2025

SHIFT ECC: A Value Converting HBM ECC Approach for Refresh Energy Efficient Integer Quantized DNN Inference

Jae Yoon Lee, Young Seo Lee, Young‐Ho Gong, Seon Wook Kim, Sung Woo Chung

딥 신경망(DNN)의 파라미터 크기가 증가함에 따라, 메모리 대역폭에 대한 수요가 커지면서 고대역폭 메모리(HBM)가 널리 채택되고 있다. 그러나 더 높은 온칩 온도로 인해 유지 시간이 짧아지면서 HBM은 더 잦은 리프레시(refresh) 연산을 필요로 하며, 그 결과 리프레시 에너지와 성능 오버헤드가 상당해진다. 본 논문에서는 추론 정확도를 유지하면서 리프레시 연산을 줄이기 위해, HBM 상의 INT8 양자화 DNN을 위한 가볍고 견고한 ECC 방식인 SHIFT ECC를 제안한다. SHIFT ECC는 음의 가중치를 양의 가중치로 변환함으로써 DNN의 신뢰성을 향상시키며, 궁극적으로 유지 오류의 주요 원인(대부분 1→0 비트 오류)을 완화한다. 또한 SHIFT ECC는 DNN 가중치의 상위 비트(더 중요한 비트)에 더 강한 ECC를 적용하고, 하위 비트(덜 중요한 비트)에는 더 약한 ECC로 보호함으로써, 동일한 패리티 비트 수에서 DNN의 견고성을 추가로 향상시킨다. 평가 결과, 1→0 비트 오류의 비율이 각각 100% 및 99%일 때 SHIFT ECC는 평균 리프레시 에너지를 각각 32.6% 및 35.0% 감소시키며, 최신 리프레시 감축 기법 대비 평균 메모리 읽기 지연을 21.7% 감소시킨다.

https://doi.org/10.1109/islped65674.2025.11261803

Latency (audio)

Robustness (evolution)

Inference

Offset (computer science)

Reliability (semiconductor)

Efficient energy use

Reduction (mathematics)

Deep neural networks

Article

인용수 0

2025

Supporting Register-based Addressing Modes for in-DRAM PIM ISAs

Seok Young Kim, Byung Ho Choi, Seokwon Kang, Yongjun Park, Seon Wook Kim

처리-내-메모리(Processing-in-Memory) 아키텍처는, 특히 DNN 응용에서 전통적인 프로세서 중심 시스템이 메모리와 연산 유닛 간 데이터를 전송하는 과정에서 발생하는 데이터 이동 병목을 완화할 수 있는 유망한 해결책을 제공한다. 그러나 이 아키텍처는 두 가지 고유한 오버헤드를 도입한다. 즉, PIM 코드 오프로딩과 CPU와 메모리 사이의 데이터 전송이다. 이러한 문제를 해결하기 위해, DMA 디스크립터 기반의 in-DRAM PIM ISA를 위한 두 가지 레지스터 기반 주소 지정 모드인 인덱스드(indexed)와 베이스-오프셋(base-offset) 주소 지정을 제안한다. 전 시스템(full-system) 성능 평가 결과, 본 접근법은 오버헤드를 유의미하게 감소시켜 기준선(baseline) PIM 대비 최대 1.94배의 속도 향상을 달성함을 보여주었으며, 동시에 면적은

4.65%

, 전력 소비는

8.61%

만 추가로 필요하였다.

https://doi.org/10.1109/dac63849.2025.11132430

Bottleneck

Speedup

Architecture

Code (set theory)

Key (lock)

Power (physics)

Data access

Baseline (sea)

Article

인용수 0

2024

Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance

Junil Kim, Seok Young Kim, Seon Wook Kim

프로세서와 메모리 간의 메모리 대역폭은 성능을 제한하며, 특히 데이터 집약적 응용이 부상하는 상황에서 그 영향이 두드러진다. 이러한 문제를 해결하기 위해 메모리 내 처리(in-memory processing)를 지원하는 방안이 활발히 연구되고 있다. 대부분의 PIM(Processing-in-Memory) 플랫폼은 계산에 앞서 모든 입력 데이터를 준비하는데, 이는 데이터 준비 과정에서 발생하는 상당한 오버헤드 때문이다. 또한 데이터 중복으로 인해 멀티채널 메모리 시스템에서는 이러한 오버헤드가 훨씬 더 크다. 본 논문에서는 멀티채널 메모리 시스템에서 PIM 연산을 지원하기 위해 비용 효율적인 DMA 오프로딩(offloading) 방법론을 개발하였다. 채널 간 데이터 공유 오버헤드를 최소화하였으며, DNN 응용의 실행에서 기존의 단일 채널 PIM 아키텍처 대비 최대 1.79배의 성능 향상을 달성하였다.

http://dx.doi.org/10.1109/iceic61013.2024.10457142

Dram

Boosting (machine learning)

Computer science

Embedded system

Computer architecture

Computer hardware

Artificial intelligence

Article

인용수 0

2024

Low Overhead PIM-to-PIM Communication on PCIe-based Multi-PIM Platforms for Executing Large-Scale AI Models

M.-S. Park, Seok Young Kim, Seon Wook Kim

떠오르는 트랜스포머는 낮은 데이터 국소성과 대용량 데이터 크기 때문에 메모리 병목 현상을 겪고 있으며, 이 병목을 극복하기 위해 메모리 내부에서 연산을 수행하는 처리-내-메모리(Processing in Memory, PIM)를 적극적으로 연구하고 있다. 그러나 모델의 매개변수가 커질수록 PIM 장치 하나만으로는 제한된 메모리 용량과 연산 자원 때문에 부족해진다. 본 논문에서는 PCIe 기반 다중 PIM 플랫폼에서 PIM 간 저오버헤드 데이터 통신 방법을 개발한다. 우리는 CPU와 PIM 간의 중복 데이터 이동을 제거하기 위해 XDMA 기반 PIM-대-PIM(P2P) 직접 데이터 통신 메커니즘을 채택한다. 그 결과, P2P는 각각 16MB, 32MB, 64MB, 128MB 데이터를 전송할 때, 시스템 메모리를 버퍼로 사용하는 DMA에 비해 1.69배, 1.70배, 1.61배, 1.63배의 속도 향상을 달성한다.

http://dx.doi.org/10.1109/iceic61013.2024.10457258

PCI Express

Computer science

Overhead (engineering)

Scale (ratio)

Embedded system

Computer architecture

Operating system

Field-programmable gate array

Article

인용수 10

2022

Extending the ONNX Runtime Framework for the Processing-in-Memory Execution

Seok Young Kim, Jaewook Lee, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

2022 International Conference on Electronics, Information, and Communication (ICEIC)

주의 메커니즘 기반 모델은 NLP 작업에 대해 충분히 정확한 성능을 제공한다. 그러나 모델의 크기가 커질수록 메모리 사용량은 기하급수적으로 증가한다. 또한, 낮은 국소성을 갖는 대량의 데이터는 데이터 이동(data movement) 과정에서 전력 소비를 과도하게 증가시킨다. 따라서 메모리 내/주변에 연산 로직을 배치하는 Processing-in-Memory(PIM)는 시스템 성능의 메모리 병목을 해결하기 위한 매력적인 해결책으로 부상하고 있다. 한편, PIM 아키텍처에 대한 다양한 설계 탐색이 연구되어 왔으나, 이에 대한 효율적인 소프트웨어 프레임워크는 드물게 수행되었다. 본 논문은 PIM 기반 플랫폼을 위해 ONNX 런타임 프레임워크를 확장한다. 이 프레임워크는 다양한 PIM 연산을 위한 기능 추상화를 제공하며, 사용자에게 손쉬운 프로그래밍 가능성을 제공한다. 우리는 프레임워크를 사용하여 GLUE 데이터셋으로 BERT 워크로드를 실행하였으며, 해당 워크로드는 주의(attention) 기반 모델들 가운데서 지배적으로 사용된다. 데이터/뱅크 수준 병렬성(data/bank-level parallelism)을 활용하고 각 뱅크에서 벡터 실행을 수행함으로써, 우리의 기준(baseline) PIM 플랫폼은 각각 x86 및 ARM CPU에 비해 평균적으로 x1.64 및 x1.71의 속도 향상을 보였다.

https://doi.org/10.1109/iceic54506.2022.9748444

Computer science

Bottleneck

Locality

Speedup

Parallel computing

Workload

x86

Software

Computer architecture

Embedded system

전체 논문

137

Article

인용수 7

2023

PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA

Won Jun Lee, Chang Hyun Kim, Yoonah Paik, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3238812

Computer science

Opcode

Operand

Parallel computing

Speedup

Compiler

Coprocessor

Instruction set

Computer architecture

Overhead (engineering)

Article

인용수 2

2023

BL-PIM: Varying the Burst Length to Realize the All-Bank Performance and Minimize the Multi-Workload Interference for in-DRAM PIM

Chang Hyun Kim, Won Jun Lee, Yoonah Paik, Seok Young Kim, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3300893

Computer science

Memory controller

Registered memory

Interleaved memory

CAS latency

Dram

Embedded system

Parallel computing

Computer hardware

Memory management

Article

인용수 5

2022

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

Seok Young Kim, Seok Young Kim, Chang Hyun Kim, Won Joon Lee, Il Memming Park, Seon Wook Kim, Seon Wook Kim

IF 2.6 (2022)

Microprocessors and Microsystems

https://doi.org/10.1016/j.micpro.2022.104592

Lookup table

Computer science

Floating point

Activation function

Algorithm

Overhead (engineering)

Parallel computing

Artificial neural network

Artificial intelligence

Article

인용수 0

2022

Low-Cost Unified Pixel Converter from the MIPI DSI Packets into Arbitrary Pixel Sizes

Kiyong Kwon, Dong‐Won Kang, Geon-Woo Ko, Seok‐Young Kim, Seon Wook Kim, Seon-Wook Kim, Seon-Wook Kim

IF 2.9 (2022)

Electronics

https://doi.org/10.3390/electronics11081221

Header

Payload (computing)

Pixel

Computer science

Network packet

Checksum

Computer hardware

Computer network

Artificial intelligence

Operating system

Article

인용수 10

2022

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

IF 3.9 (2022)

IEEE Access

https://doi.org/10.1109/access.2022.3203051

Dram

Computer science

Decoupling (probability)

Interface (matter)

Random access memory

Non-volatile random-access memory

CAS latency

Embedded system

Registered memory

Memory management

Article

인용수 0

2025

SHIFT ECC: A Value Converting HBM ECC Approach for Refresh Energy Efficient Integer Quantized DNN Inference

Jae Yoon Lee, Young Seo Lee, Young‐Ho Gong, Seon Wook Kim, Sung Woo Chung

https://doi.org/10.1109/islped65674.2025.11261803

Latency (audio)

Robustness (evolution)

Inference

Offset (computer science)

Reliability (semiconductor)

Efficient energy use

Reduction (mathematics)

Deep neural networks

Article

인용수 0

2025

Supporting Register-based Addressing Modes for in-DRAM PIM ISAs

Seok Young Kim, Byung Ho Choi, Seokwon Kang, Yongjun Park, Seon Wook Kim

4.65%

, 전력 소비는

8.61%

만 추가로 필요하였다.

https://doi.org/10.1109/dac63849.2025.11132430

Bottleneck

Speedup

Architecture

Code (set theory)

Key (lock)

Power (physics)

Data access

Baseline (sea)

Article

인용수 0

2024

Supporting Multi-Channels to DRAM-based PIM Execution for Boosting the Performance

Junil Kim, Seok Young Kim, Seon Wook Kim

http://dx.doi.org/10.1109/iceic61013.2024.10457142

Dram

Boosting (machine learning)

Computer science

Embedded system

Computer architecture

Computer hardware

Artificial intelligence

Article

인용수 0

2024

Low Overhead PIM-to-PIM Communication on PCIe-based Multi-PIM Platforms for Executing Large-Scale AI Models

M.-S. Park, Seok Young Kim, Seon Wook Kim

http://dx.doi.org/10.1109/iceic61013.2024.10457258

PCI Express

Computer science

Overhead (engineering)

Scale (ratio)

Embedded system

Computer architecture

Operating system

Field-programmable gate array

Article

인용수 10

2022

Extending the ONNX Runtime Framework for the Processing-in-Memory Execution

Seok Young Kim, Jaewook Lee, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

2022 International Conference on Electronics, Information, and Communication (ICEIC)

https://doi.org/10.1109/iceic54506.2022.9748444

Computer science

Bottleneck

Locality

Speedup

Parallel computing

Workload

x86

Software

Computer architecture

Embedded system

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

Article

인용수 7

2023

PISA-DMA: Processing-in-Memory Instruction Set Architecture Using DMA

Won Jun Lee, Chang Hyun Kim, Yoonah Paik, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3238812

Computer science

Opcode

Operand

Parallel computing

Speedup

Compiler

Coprocessor

Instruction set

Computer architecture

Overhead (engineering)

Article

인용수 2

2023

BL-PIM: Varying the Burst Length to Realize the All-Bank Performance and Minimize the Multi-Workload Interference for in-DRAM PIM

Chang Hyun Kim, Won Jun Lee, Yoonah Paik, Seok Young Kim, Seon Wook Kim

IF 3.4 (2023)

IEEE Access

https://doi.org/10.1109/access.2023.3300893

Computer science

Memory controller

Registered memory

Interleaved memory

CAS latency

Dram

Embedded system

Parallel computing

Computer hardware

Memory management

Article

인용수 5

2022

Low-overhead inverted LUT design for bounded DNN activation functions on floating-point vector ALUs

Seok Young Kim, Seok Young Kim, Chang Hyun Kim, Won Joon Lee, Il Memming Park, Seon Wook Kim, Seon Wook Kim

IF 2.6 (2022)

Microprocessors and Microsystems

https://doi.org/10.1016/j.micpro.2022.104592

Lookup table

Computer science

Floating point

Activation function

Algorithm

Overhead (engineering)

Parallel computing

Artificial neural network

Artificial intelligence

Article

인용수 0

2022

Low-Cost Unified Pixel Converter from the MIPI DSI Packets into Arbitrary Pixel Sizes

Kiyong Kwon, Dong‐Won Kang, Geon-Woo Ko, Seok‐Young Kim, Seon Wook Kim, Seon-Wook Kim, Seon-Wook Kim

IF 2.9 (2022)

Electronics

https://doi.org/10.3390/electronics11081221

Header

Payload (computing)

Pixel

Computer science

Network packet

Checksum

Computer hardware

Computer network

Artificial intelligence

Operating system

Article

인용수 10

2022

Achieving the Performance of All-Bank In-DRAM PIM With Standard Memory Interface: Memory-Computation Decoupling

Yoonah Paik, Chang Hyun Kim, Won Jun Lee, Seon Wook Kim

IF 3.9 (2022)

IEEE Access

https://doi.org/10.1109/access.2022.3203051

Dram

Computer science

Decoupling (probability)

Interface (matter)

Random access memory

Non-volatile random-access memory

CAS latency

Embedded system

Registered memory

Memory management