논문 | 유승주 교수 연구실 | 서울대학교 컴퓨터공학부

유승주 교수 연구실

서비스 플랜

연구실 검색

프로젝트 공고

정부 과제 추천

AI 기반 기업 서칭

홈

기본 정보

연구 분야

프로젝트

논문

구성원

논문

연구 성과 추이

표시된 성과는 수집된 데이터 기준으로 산출되며, 일부 차이가 있을 수 있습니다.

5개년 연도별 논문 게재 수

33총합

5개년 연도별 피인용 수

389총합

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

article

인용수 5

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

우리는 다중 시점 이미지로부터 조밀하고 정확한 3D 재구성을 위한 새로운 구조지향운동(Structure from Motion, SfM) 프레임워크인 Dense-SfM을 제시한다. 전통적인 SfM 방법들이 종종 의존하는 희소 특징점 매칭은 특히 질감이 없는 영역에서 정확도와 점 밀도를 모두 제한한다. Dense-SfM은 이러한 한계를 극복하기 위해 조밀 매칭을 가우시안 스플래팅(Gaussian Splatting, GS) 기반 트랙 확장(track extension)과 통합하여 보다 일관적이고 더 긴 특징 트랙을 제공한다. 재구성 정확도를进一步 향상시키기 위해 Dense-SfM은 트랜스포머 및 가우시안 프로세스(Gaussian Process) 아키텍처를 활용하는 다중 시점 커널화 매칭 모듈을 갖추고 있으며, 이는 다중 시점 전반에 걸친 견고한 트랙 정제를 가능하게 한다. ETH3D 및 Texture-Poor SfM 데이터셋에 대한 평가는 Dense-SfM이 기존 최첨단 방법에 비해 정확도와 밀도에서 유의미한 개선을 제공함을 보여준다. 프로젝트 페이지: https://icetea-cv.github.io/densesfm/.

https://doi.org/10.1109/cvpr52734.2025.00600

Structure from motion

Computer science

Matching (statistics)

Motion (physics)

Artificial intelligence

Computer vision

Mathematics

Statistics

article

인용수 0

2025

FuriosaAI RNGD: A Tensor Contraction Processor for Sustainable AI Computing

Younggeun Choi, Junyoung Park, Sang Min Lee, Jeseung Yeon, Minho Kim, Chang-Jae Park, Byeongwook Bae, Hyunmin Jeong, Hanjoon Kim, June Paik, Nuno P. Lopes, Sungjoo Yoo

IF 2.9 (2025)

IEEE Micro

현대의 인공지능(AI) 워크로드는 다양한 텐서 수축(tensor contraction) 패턴을 효율적으로 처리할 수 있는 아키텍처를 요구한다. 고정 크기의 행렬 곱셈에 기반한 전통적 접근 방식은 종종 확장성과 유연성 측면에서 한계를 보인다. 제2세대 텐서 수축 프로세서인 RNGD(“Renegade”로 발음)는 텐서 연산에 내재된 병렬성과 데이터 국소성(data locality)을 활용하도록 설계된 혁신적인 아키텍처를 제안한다. 그 거친 입자(coarse-grained) 처리 요소(PE)는 단일의 대규모 단위로 동작하거나 여러 개의 독립적인 단위로 동작할 수 있어 다양한 텐서 형태에 대한 유연성을 제공한다. 회로 스위치 기반 페치 네트워크(circuit switch-based fetch network), 입력 브로드캐스팅(input broadcasting), 버퍼 기반 재사용(buffer-based reuse) 메커니즘과 같은 주요 혁신은 계산 효율을 한층 더 향상시킨다. RNGD는 차세대 AI 워크로드의 지속 가능한 연산을 위한 최적화된 성능과 에너지 효율을 제공하며, 프로세서 아키텍처에서의 중요한 발전을 의미한다.

https://doi.org/10.1109/mm.2025.3551880

Computer science

Contraction (grammar)

Tensor (intrinsic definition)

Parallel computing

Computer architecture

Tensor contraction

Computational science

Tensor product

article

인용수 1

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

우리는 Rotate, Clip, and Partition (RCP)이라는 양자화 인식 학습(Quantization-Aware Training, QAT) 접근법을 제안한다. RCP는 먼저 W2A4KV4(2비트 가중치, 4비트 활성, 4비트 KV-cache) 구성을 통해 LLM을 극단적으로 압축하는 것을 실현한다. RCP는 회전에 관한 최근 기법들을 통합하되, 회전이 가중치 분포의 비균일성에 미치는 영향을 이론적 및 실증적으로 분석함으로써 새로운 비균일 가중치 양자화기 설계를 제안한다. 우리의 가중치 양자화기인 Learnable Direct Partitioning (LDP)은 LLM 가중치와 함께 비균일 구간을 직접 학습하도록 학습 가능한 파라미터를 도입한다. 또한 비균일 W2A4에 대해 GEMV를 지원하는 GPU 커널을 개념 증명의 형태로 제시한다. 실험 결과, RCP는 LLaMA-2-7B를 W2A4KV4로 압축하되 WikiText2 PPL 손실은 2.84에 그치며, 메모리 사용량은 5.29배 감소한다. 더 나아가 RCP는 수렴 실패나 반복과 같은 치명적 문제 없이, 모바일을 대상으로 하는 어려운 LLaMA-3.2 모델들과 도메인 특화 WizardCoder-7B 및 MetaMath-7B도 양자화할 수 있음을 보인다. 코드는 https://github.com/ songsm921/RCP 에서 제공된다. * 는 동등 기여를, 1 은 교신저자를, 우리는 W2A4KV4처럼 WlAmKVn 형태로 l비트 가중치, m비트 활성 및 n비트 KV-cache를 지칭한다. 비균일 양자화를 위한 LUT 추론과 4비트 활성에 대한 특수 가속을 모두 지원하는 사용 가능한 하드웨어가 없으므로, 개념 증명의 목적으로 CUDA에서 가속된 GEMV 커널을 설계한다. 우리의 커널은 메모리 사용량을 최대 5.29배까지 줄이면서, FP16 PyTorch(Paszke et al., 2019) 및 INT4 QuaRot 구현보다 더 낮은 지연 시간을 달성할 수 있다. 우리의 기여는 다음과 같이 요약된다. 우리는 회전이 가중치 분포와 어떻게 상호작용하며 극단적 W2A4KV4 양자화에서 어떤 어려움을 야기하는지를 실증적·이론적으로 분석한다. 이를 해결하기 위해, 우리는 회전에서의 장점과 QAT를 LDP를 통해 결합하는 양자화 알고리즘 RCP를 도입하는데, LDP는 완전히 학습 가능한 비균일 양자화기이다. 우리는 RCP가 처음으로 W2A4KV4 및 W3A4KV4 양자화를 달성함을 방대한 실험을 통해 입증하여 최첨단 성능을 달성한다. 예비지식 2.1 LLM 양자화를 위한 랜덤 회전

https://doi.org/10.18653/v1/2025.findings-emnlp.400

Quantization (signal processing)

Rotation (mathematics)

Signal processing

Context (archaeology)

article

인용수 1

2024

NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks

J.K. Heo, Sungjoo Yoo

IF 2.9 (2024)

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

신경 복사장(NeRF, Neural Radiance Field)은 렌더링에서 전례 없는 사실감을 제공하는 최신 기술로 부상하였다. 그럼에도 불구하고, NeRF의 도입은 높은 연산 비용으로 인해 렌더링 속도가 느리다는 제약을 받는다. NeRF의 복셀 기반 최적화는 계산 비용을 줄여 이러한 문제를 완화하지만, 상당한 메모리 오버헤드를 야기한다. 본 연구에서는 이 문제를 해결하기 위해 하드웨어-소프트웨어 공동 설계 접근법인 NeRF-PIM을 제안한다. 국소성이 좋지 않은 대규모 모델(복셀 그리드)에 대한 메모리 접근과 낮은 연산 밀도로 인한 문제를 해결하기 위해, 데이터 배치, 중복 제거, 연산 재사용의 관점에서 처리-in-메모리(PIM)와 PIM 인지형 소프트웨어 최적화를 함께 활용하는 방안을 제안한다. 본 PIM 하드웨어는 삼선형 보간과 내적(dot product) 연산을 가속하는 것을 목표로 한다. 특히 복셀에 대한 무작위 접근으로 인해 내부 대역폭 활용도가 낮은 문제를 해결하기 위해, 복셀 그리드에서의 보간 연산의 특성을 신중하게 활용하는 데이터 배치를 제안한다. 이는 복셀 접근에서의 뱅크 충돌(bank conflicts)을 제거하는 데 도움을 줄 뿐 아니라, 기존 PIM 장치의 all-bank 모드를 활용하여 PIM 명령 발행의 효율을 향상시킨다. PIM 인지형 소프트웨어 최적화로는 점유 그리드(occupancy-grid) 인지 가지치기(Pruning)와 one-voxel two-sampling(1V2S) 기법을 제안하며, 이는 빈 공간에서의 중복 연산을 회피함으로써 계산 효율을 향상시키고, 복셀 단위의 내적 결과를 재사용함으로써 메모리 트래픽을 감소시키는 데 기여한다. 실제 기준 HBM-PIM 장치를 사용하여 실험을 수행하였다. 그 결과, NeRF-PIM은 Synthetic-NeRF 및 Tanks and Temples의 두 데이터셋에서 각각 기준 대비 7.4배 및

5.0 \times

의 속도 향상을 보였다.

http://dx.doi.org/10.1109/tcad.2024.3443712

Computer science

Rendering (computer graphics)

Co-design

Software

Computer hardware

Computer architecture

Embedded system

Computer graphics (images)

Operating system

article

인용수 1

2022

Introduction to the Special Section on Energy-Efficient AI Chips

Vikas Chandra, Yiran Chen, Sungjoo Yoo

IF 1.4 (2022)

ACM Transactions on Design Automation of Electronic Systems

초록을 사용할 수 없습니다.

https://doi.org/10.1145/3538502

Computer science

Special section

Section (typography)

Energy (signal processing)

Parallel computing

Computer architecture

Operating system

Engineering physics

Physics

전체 논문

242

article

인용수 5

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

https://doi.org/10.1109/cvpr52734.2025.00600

Structure from motion

Computer science

Matching (statistics)

Motion (physics)

Artificial intelligence

Computer vision

Mathematics

Statistics

article

인용수 0

2025

FuriosaAI RNGD: A Tensor Contraction Processor for Sustainable AI Computing

Younggeun Choi, Junyoung Park, Sang Min Lee, Jeseung Yeon, Minho Kim, Chang-Jae Park, Byeongwook Bae, Hyunmin Jeong, Hanjoon Kim, June Paik, Nuno P. Lopes, Sungjoo Yoo

IF 2.9 (2025)

IEEE Micro

https://doi.org/10.1109/mm.2025.3551880

Computer science

Contraction (grammar)

Tensor (intrinsic definition)

Parallel computing

Computer architecture

Tensor contraction

Computational science

Tensor product

article

인용수 1

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

https://doi.org/10.18653/v1/2025.findings-emnlp.400

Quantization (signal processing)

Rotation (mathematics)

Signal processing

Context (archaeology)

article

인용수 1

2024

NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks

J.K. Heo, Sungjoo Yoo

IF 2.9 (2024)

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

5.0 \times

의 속도 향상을 보였다.

http://dx.doi.org/10.1109/tcad.2024.3443712

Computer science

Rendering (computer graphics)

Co-design

Software

Computer hardware

Computer architecture

Embedded system

Computer graphics (images)

Operating system

article

인용수 1

2022

Introduction to the Special Section on Energy-Efficient AI Chips

Vikas Chandra, Yiran Chen, Sungjoo Yoo

IF 1.4 (2022)

ACM Transactions on Design Automation of Electronic Systems

초록을 사용할 수 없습니다.

https://doi.org/10.1145/3538502

Computer science

Special section

Section (typography)

Energy (signal processing)

Parallel computing

Computer architecture

Operating system

Engineering physics

Physics

preprint

인용수 0

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

ArXiv.org

우리는 회전, 클립 및 분할(Rotate, Clip, and Partition, RCP)이라는 양자화 인식 학습(quantization-aware training, QAT) 접근법을 제안한다. 이 방법은 먼저 W2A4KV4(2비트 가중치, 4비트 활성, 4비트 KV 캐시) 구성으로 LLM을 극단적으로 압축하는 것을 실현한다. RCP는 최근의 회전(rotation) 기법을 L2-bit 가중치 양자화에 대한 임의 회전(random rotation)의 영향을 정량적으로 분석함으로써, 새로운 비균일(non-uniform) 가중치 양자화기 설계와 통합한다. 우리의 가중치 양자화기는 학습 가능한 직접 분할(Learnable Direct Partitioning, LDP)을 특징으로 하며, LLM 가중치와 함께 비균일 구간을 직접 학습하기 위해 학습 가능한 매개변수를 도입한다. 또한 비균일 W2A4에 대해 GEMV를 지원하는 전용 GPU 커널을 제시한다. 실험 결과, RCP는 LLaMA-2-7B를 W2A4KV4로 압축하면서 WikiText2 ppl 손실이 2.84에 불과하고 메모리 사용량은 5.29배 감소한다. 더 나아가 RCP는 수렴 실패 및 반복(repetition)과 같은 치명적인 문제 없이, 모바일 대상(mobile-targeted) LLaMA-3.2 모델과 도메인 특화 WizardCoder-7B 및 MetaMath-7B를 양자화할 수 있음을 보여준다. 코드는 https://github.com/ songsm921/RCP에서 제공된다.

http://arxiv.org/abs/2502.15779

Quantization (signal processing)

Rotation (mathematics)

Kernel (algebra)

Convergence (economics)

Partition (number theory)

Data compression

Compression (physics)

Code (set theory)

preprint

인용수 0

2025

Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training

Myeonghwan Ahn, Sungjoo Yoo

ArXiv.org

대규모 언어 모델(LLM)의 규모가 지속적으로 커지면서 효율성 향상이 요구되고 있으며, BF16보다 완전 양자화 학습(FQT)이 선호되고 있다. FQT는 학습을 가속하지만, 일관성 문제에 직면하며 안정성을 보장하기 위해 각 경우마다 200B 토큰이 넘는 방대한 탐색(지수적으로 많은 경우에 대한 탐색)을 필요로 한다. 의사(가상) 양자화 학습(PQT)은 FQT의 문제를 해결하지만, 그에 대한 연구는 충분히 이루어지지 않았다. 본 연구에서는 PQT의 실제적 함의를 상세히 탐색하고, 확률정밀도( stochastic precision) 어닐링을 포함한 이상적 특성을 갖추면서도 부동소수점(FP)에 친화적인 노이즈 분포

R

을 제안한다. 그 결과, 제안된 방법은 덧셈 후 이어지는 FP 캐스팅을 통해 효율적인 가짜 양자화를 활용함으로써 PQT를 통한 저정밀 FP 파라미터에 대한 효과적인 이론적 토대를 제공한다. 우리는 가우시안 가중치 샘플링이 (1) 확장성이 있음을 보이며, BF16 연산자에서 최대 9비트의 고정밀 노이즈까지 지원하면서 FP6까지의 저정밀 FP 파라미터를 가능하게 한다. 제안된 방법은 (2) 효율적이다: A100 GPU에서 Llama2 학습 토큰 초당 처리량 기준으로 계산 오버헤드를 1.40\%까지 낮게 수반하며, GPU 메모리에서는 파라미터당 2바이트가 필요하다. 또한 우리는 가우시안 가중치 샘플링을 사용한 PQT가 (3) 안정적임을 입증한다. 즉, 최대 1B 파라미터와 300B 토큰으로 GPT2 및 Llama2 모델을 사전학습할 때 BF16 기준선의 성능을 면밀히 추종하거나 심지어 능가한다.

http://arxiv.org/abs/2505.11170

Gaussian

Noise (video)

Quantization (signal processing)

Sampling (signal processing)

Gaussian noise

Consistency (knowledge bases)

Overhead (engineering)

Training (meteorology)

preprint

인용수 0

2025

NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Donghyun Son, Euntae Choi, Sungjoo Yoo

ArXiv.org

대규모 언어 모델(LLM) 추론은 키-값(KV) 캐시의 큰 크기 때문에, 특히 큰 배치 크기와 긴 시퀀스를 처리할 때 일반적으로 메모리 집약적이다. 벡터 양자화(VQ)는 이 문제를 완화하기 위해 최근 도입되었으나, 보정(calibration) 데이터셋에 의존한다는 점 때문에 기존 접근이 분포 이동(distribution shift)에 취약함을 확인하였다. 이러한 한계를 해결하기 위해, KV 캐시에 대한 저비트 압축을 목표로 한 보정 불필요(calibration-free) 벡터 양자화(VQ) 기법인 NSNQuant를 제안한다. 세 단계의 변환—1) 토큰 단위 정규화(Normalize), 2) 채널 단위 중심 이동(Shift), 3) 하다마드(Hadamard) 변환을 포함한 두 번째 토큰 단위 정규화(Normalize)—을 적용함으로써 NSNQuant는 토큰 분포를 표준 정규 분포와 효과적으로 정렬한다. 이러한 정렬은 단일 재사용 가능한 코드북(codebook)을 사용하여 견고하고 보정 불필요한 벡터 양자화를 가능하게 한다. 광범위한 실험 결과, NSNQuant는 1비트 및 2비트 설정 모두에서 기존 방법들보다 일관되게 우수한 성능을 보였으며, 완전 정밀(full-precision) 기준선 대비 최대 3

im es

처리량(throughput) 향상을 제공하여 강한 일반화 성능을 나타냈다.

http://arxiv.org/abs/2505.18231

Normalization (sociology)

Vector quantization

Quantization (signal processing)

Security token

Linde–Buzo–Gray algorithm

Learning vector quantization

Data compression

preprint

인용수 0

2025

Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

ArXiv.org

대규모 언어 모델(LLM)은 높은 연산 비용으로 인해 배치 시 어려움에 직면해 있으며, 사후 학습 양자화(Post-Training Quantization, PTQ)가 한 가지 해결책이지만, 기존 회전 기반 방법은 2비트와 같은 매우 낮은 비트폭에서는 어려움을 겪는다. 본 연구에서는 기존 방법의 한계를 해결하는 향상된 회전 행렬을 구성하기 위한, 학습이 필요 없는(free) 새로운 접근법을 제안한다. 핵심 기여는 순서(sequency) 배열을 활용한 월시-하다마드 변환(Walsh-Hadamard transform)으로, 유사한 주파수 성분을 군집화하여 표준 하다마드 행렬에 비해 양자화 오차를 줄이고 성능을 크게 향상시킨다는 점이다. 또한, 더 작은 월시 블록을 갖는 블록 대각(block-diagonal) 행렬을 사용하는 Grouped Sequency-arranged Rotation(GSR)을 제안하여, 이상치(outlier)의 영향을 효과적으로 격리함으로써 학습 기반 최적화 방법과 견줄 만한 성능을 달성하면서도 어떤 학습도 요구하지 않는다. 본 방법은 추론 과제와 WikiText-2에서의 Perplexity(PPL) 점수에서 견고한 성능을 보인다. 또한, 본 방법은 기존에 학습된 회전 기법들 위에 적용하더라도 결과를 개선한다.

http://arxiv.org/abs/2505.03810

Quantization (signal processing)

Hadamard transform

Rotation (mathematics)

Outlier

Perplexity

Transformation (genetics)

Lossless compression

Multiplier (economics)

preprint

인용수 0

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

ArXiv.org

우리는 다중 시점 이미지로부터 조밀하고 정확한 3D 복원을 위해 설계된 새로운 구조 복원(Structure from Motion, SfM) 프레임워크인 Dense-SfM을 제안한다. 전통적인 SfM 방법이 흔히 의존하는 희소 특징점 매칭은 정확성과 점 밀도 모두를 제한하며, 특히 질감이 없는 영역에서는 그 한계가 두드러진다. Dense-SfM은 조밀 매칭과 Gaussian Splatting(GS) 기반 트랙 확장을 통합함으로써 이러한 제한을 해결하여 보다 일관적이고 더 긴 특징 트랙을 제공한다. 재구성 정확도를 추가로 향상시키기 위해 Dense-SfM은 트랜스포머 및 Gaussian Process 아키텍처를 활용하는 다중 시점 커널화 매칭 모듈을 갖추어, 다중 시점 전반에서 견고한 트랙 정제를 수행한다. ETH3D 및 Texture-Poor SfM 데이터셋에 대한 평가는 Dense-SfM이 최신 기술(state-of-the-art) 방법 대비 정확도와 밀도에서 유의미한 향상을 제공함을 보여준다. 프로젝트 페이지: https://icetea-cv.github.io/densesfm/.

http://arxiv.org/abs/2501.14277

Structure from motion

Matching (statistics)

Motion (physics)

Geology

Geodesy

Computer science

Geometry

Artificial intelligence

Mathematics

Statistics

프로젝트 공고 서비스 문의 자주 묻는 질문 이용약관 개인정보처리방침

주식회사 디써클

대표 장재우,이윤구서울특별시 강남구 역삼로 169, 명우빌딩 2층 (TIPS타운 S2)대표 전화 0507-1312-6417이메일 info@rndcircle.io사업자등록번호 458-87-03380호스팅제공자 구글 클라우드 플랫폼(GCP)

주요 논문

*2026년 기준 최근 6년 이내 논문에 한해 Impact Factor가 표기됩니다.

article

인용수 5

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

https://doi.org/10.1109/cvpr52734.2025.00600

Structure from motion

Computer science

Matching (statistics)

Motion (physics)

Artificial intelligence

Computer vision

Mathematics

Statistics

article

인용수 0

2025

FuriosaAI RNGD: A Tensor Contraction Processor for Sustainable AI Computing

Younggeun Choi, Junyoung Park, Sang Min Lee, Jeseung Yeon, Minho Kim, Chang-Jae Park, Byeongwook Bae, Hyunmin Jeong, Hanjoon Kim, June Paik, Nuno P. Lopes, Sungjoo Yoo

IF 2.9 (2025)

IEEE Micro

https://doi.org/10.1109/mm.2025.3551880

Computer science

Contraction (grammar)

Tensor (intrinsic definition)

Parallel computing

Computer architecture

Tensor contraction

Computational science

Tensor product

article

인용수 1

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

https://doi.org/10.18653/v1/2025.findings-emnlp.400

Quantization (signal processing)

Rotation (mathematics)

Signal processing

Context (archaeology)

article

인용수 1

2024

NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks

J.K. Heo, Sungjoo Yoo

IF 2.9 (2024)

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

5.0 \times

의 속도 향상을 보였다.

http://dx.doi.org/10.1109/tcad.2024.3443712

Computer science

Rendering (computer graphics)

Co-design

Software

Computer hardware

Computer architecture

Embedded system

Computer graphics (images)

Operating system

article

인용수 1

2022

Introduction to the Special Section on Energy-Efficient AI Chips

Vikas Chandra, Yiran Chen, Sungjoo Yoo

IF 1.4 (2022)

ACM Transactions on Design Automation of Electronic Systems

초록을 사용할 수 없습니다.

https://doi.org/10.1145/3538502

Computer science

Special section

Section (typography)

Energy (signal processing)

Parallel computing

Computer architecture

Operating system

Engineering physics

Physics

전체 논문

242

article

인용수 5

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

https://doi.org/10.1109/cvpr52734.2025.00600

Structure from motion

Computer science

Matching (statistics)

Motion (physics)

Artificial intelligence

Computer vision

Mathematics

Statistics

article

인용수 0

2025

FuriosaAI RNGD: A Tensor Contraction Processor for Sustainable AI Computing

Younggeun Choi, Junyoung Park, Sang Min Lee, Jeseung Yeon, Minho Kim, Chang-Jae Park, Byeongwook Bae, Hyunmin Jeong, Hanjoon Kim, June Paik, Nuno P. Lopes, Sungjoo Yoo

IF 2.9 (2025)

IEEE Micro

https://doi.org/10.1109/mm.2025.3551880

Computer science

Contraction (grammar)

Tensor (intrinsic definition)

Parallel computing

Computer architecture

Tensor contraction

Computational science

Tensor product

article

인용수 1

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

https://doi.org/10.18653/v1/2025.findings-emnlp.400

Quantization (signal processing)

Rotation (mathematics)

Signal processing

Context (archaeology)

article

인용수 1

2024

NeRF-PIM: PIM Hardware-Software Co-Design of Neural Rendering Networks

J.K. Heo, Sungjoo Yoo

IF 2.9 (2024)

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

5.0 \times

의 속도 향상을 보였다.

http://dx.doi.org/10.1109/tcad.2024.3443712

Computer science

Rendering (computer graphics)

Co-design

Software

Computer hardware

Computer architecture

Embedded system

Computer graphics (images)

Operating system

article

인용수 1

2022

Introduction to the Special Section on Energy-Efficient AI Chips

Vikas Chandra, Yiran Chen, Sungjoo Yoo

IF 1.4 (2022)

ACM Transactions on Design Automation of Electronic Systems

초록을 사용할 수 없습니다.

https://doi.org/10.1145/3538502

Computer science

Special section

Section (typography)

Energy (signal processing)

Parallel computing

Computer architecture

Operating system

Engineering physics

Physics

preprint

인용수 0

2025

Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

ArXiv.org

http://arxiv.org/abs/2502.15779

Quantization (signal processing)

Rotation (mathematics)

Kernel (algebra)

Convergence (economics)

Partition (number theory)

Data compression

Compression (physics)

Code (set theory)

preprint

인용수 0

2025

Gaussian Weight Sampling for Scalable, Efficient and Stable Pseudo-Quantization Training

Myeonghwan Ahn, Sungjoo Yoo

ArXiv.org

R

http://arxiv.org/abs/2505.11170

Gaussian

Noise (video)

Quantization (signal processing)

Sampling (signal processing)

Gaussian noise

Consistency (knowledge bases)

Overhead (engineering)

Training (meteorology)

preprint

인용수 0

2025

NSNQuant: A Double Normalization Approach for Calibration-Free Low-Bit Vector Quantization of KV Cache

Donghyun Son, Euntae Choi, Sungjoo Yoo

ArXiv.org

im es

처리량(throughput) 향상을 제공하여 강한 일반화 성능을 나타냈다.

http://arxiv.org/abs/2505.18231

Normalization (sociology)

Vector quantization

Quantization (signal processing)

Security token

Linde–Buzo–Gray algorithm

Learning vector quantization

Data compression

preprint

인용수 0

2025

Grouped Sequency-arranged Rotation: Optimizing Rotation Transformation for Quantization for Free

Euntae Choi, Sumin Song, Woosang Lim, Sungjoo Yoo

ArXiv.org

http://arxiv.org/abs/2505.03810

Quantization (signal processing)

Hadamard transform

Rotation (mathematics)

Outlier

Perplexity

Transformation (genetics)

Lossless compression

Multiplier (economics)

preprint

인용수 0

2025

Dense-SfM: Structure from Motion with Dense Consistent Matching

Jongmin Lee, Sungjoo Yoo

ArXiv.org

http://arxiv.org/abs/2501.14277

Structure from motion

Matching (statistics)

Motion (physics)

Geology

Geodesy

Computer science

Geometry

Artificial intelligence

Mathematics

Statistics