Empowering Edge Devices With Processing-in-Memory for On-Device Language Inference | 하순회 교수 연구실 | 서울대학교 컴퓨터공학부

하순회 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

발행물

구성원

article|

인용수 2

·2025

Empowering Edge Devices With Processing-in-Memory for On-Device Language Inference

Jimin Lee, Soonhoi Ha

IF 2IEEE Embedded Systems Letters

초록

The rapid advancement of deep learning (DL) models has led to a pressing need for efficient on-device DL solutions, particularly for edge devices with limited resources. processing-in-memory (PIM) technology is considered a promising technology to address the worsening memory wall problem by integrating processing capabilities directly into memory modules. This letter evaluates the potential of Samsung PIM technology in enhancing the performance of on-device language inference. We assess the impact of PIM on the inference stage of three transformer models, Gemma, Qwen2, and TinyBERT demonstrating an average 1.92x speed-up in end-to-end latency compared to CPU by offloading all linear layers to PIM. Notably, Qwen2, which has characteristics favorable to PIM, achieves a 1.25x speed-up in end-to-end latency compared to GPU. Our findings emphasize the importance of understanding model characteristics for effective PIM deployment. The results demonstrate the PIM solution’s efficiency in enabling on-device language models and its edge deployment potential.

키워드

Computer scienceInferenceEnhanced Data Rates for GSM EvolutionEmbedded systemComputer hardwareProgramming languageComputer architectureHuman–computer interactionArtificial intelligence

타입

article

IF / 인용수

2 / 2

원문

https://doi.org/10.1109/les.2025.3538827

게재 연도

2025