Slim-Llama: A 4.69mW Large-Language-Model Processor with Binary/Ternary Weights for Billion-Parameter Llama Model | 유회준 교수 연구실 | 한국과학기술원 전기및전자공학부

유회준 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

article|

인용수 5

·2025

Slim-Llama: A 4.69mW Large-Language-Model Processor with Binary/Ternary Weights for Billion-Parameter Llama Model

Sangyeob Kim, Jungwan Lee, Hoi‐Jun Yoo

초록

Recently, multiple ASICs [1]–[6] have been proposed to accelerate large language models (LLMs). However, the enormous number of LLM parameters leads to significant energy consumption due to external memory access (EMA). When normalizing the system energy required to process 1024 input tokens by the number of parameters, previous ASICs [1]–[5] required 79-222pJ/param for small models with 336-682M parameters, as shown in Fig. 23.9.1. Even an ASIC [6] designed to reduce EMA still consumes 26pJ/param for a 708M model. Due to large EMA, prior works [1]–[6] were limited to processing small models like GPT-2 [7] and cannot handle highly accurate AI models with billions of parameters, such as Llama [8]. To overcome this, we adopt binary or ternary quantization and new hardware optimizations, enabling the proposed ASIC to consume 9pJ/param and less than 5mW power for a billion-parameter Llama.

키워드

Ternary operationBinary numberComputer scienceArithmeticParallel computingProgramming languageMathematics

타입

article

IF / 인용수

- / 5

원문

https://doi.org/10.1109/isscc49661.2025.10904761

게재 연도

2025

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)