기본 정보
연구 분야
프로젝트
발행물
구성원
article|
인용수 2
·2025
Empowering Edge Devices With Processing-in-Memory for On-Device Language Inference
Jimin Lee, Soonhoi Ha
IF 2IEEE Embedded Systems Letters
초록

The rapid advancement of deep learning (DL) models has led to a pressing need for efficient on-device DL solutions, particularly for edge devices with limited resources. processing-in-memory (PIM) technology is considered a promising technology to address the worsening memory wall problem by integrating processing capabilities directly into memory modules. This letter evaluates the potential of Samsung PIM technology in enhancing the performance of on-device language inference. We assess the impact of PIM on the inference stage of three transformer models, Gemma, Qwen2, and TinyBERT demonstrating an average 1.92x speed-up in end-to-end latency compared to CPU by offloading all linear layers to PIM. Notably, Qwen2, which has characteristics favorable to PIM, achieves a 1.25x speed-up in end-to-end latency compared to GPU. Our findings emphasize the importance of understanding model characteristics for effective PIM deployment. The results demonstrate the PIM solution’s efficiency in enabling on-device language models and its edge deployment potential.

키워드
Computer scienceInferenceEnhanced Data Rates for GSM EvolutionEmbedded systemComputer hardwareProgramming languageComputer architectureHuman–computer interactionArtificial intelligence
타입
article
IF / 인용수
2 / 2
게재 연도
2025