기본 정보
연구 분야
프로젝트
발행물
구성원
article|
인용수 0
·2026
Enabling Computation and Communication Overlap in PIMs for on-device LLM Inference
Siu Jeong, Suhwan Kim, Changyong Eom, Jaehyuk Huh
IF 1.4IEEE Computer Architecture Letters
초록

DRAM-based processing-in-memory (PIM) exploits all-bank execution to accelerate memory-bound workloads, but limited PIM capacity makes runtime data movement unavoid able and often performance-critical. Existing PIM architectures tightly couple computation and memory access across all banks, serializing computation and data transfer and limiting opportunities to hide communication latency. This paper proposes a bank-level computation–communication overlap mechanism that selectively assigns a subset of bank resources for communication, enabling concurrent data transfer without disrupting ongoing computation while preserving the unified control model of all bank PIM. Evaluations on mixture-of-experts (MoE) and multi small language model (multi-SLM) inference workloads using a simulated LPDDR5-based PIM system show execution time reductions of up to 19% and 16%, respectively, along with consistently improved channel bandwidth utilization compared to all-bank and channel-level overlap baselines.

키워드
InferenceComputationKey (lock)Algorithm design
타입
article
IF / 인용수
1.4 / 0
게재 연도
2026