발행물

전체 논문

48

1

MOST: Memory Oversubscription-aware Scheduling for Tensor Migration on GPU Unified Storage
IEEE CAL, 1970

2

Hierarchical Traversal Stack Design Using Shared Memory for GPU Ray Tracing
ISPASS 2025, 1970

3

SSFFT: Energy-Efficient Selective Scaling for Fast Fourier Transform in Embedded GPUs
LCTES 2025, 1970

4

Kubism: Disassembling and Reassembling K-Means Clustering for Mobile Heterogeneous Platforms
LCTES 2025, 1970

5

Avant-Garde: Empowering GPUs with Scaled Numeric Formats
ISCA 2025, 1970

6

Beyond VABlock: Improving Transformer Workloads through Aggressive Prefetching
Journal of Systems Architecture, 1970

7

TM-Training: An Energy-Efficient Tiered Memory System for Deep Learning Training in NPUs
ACM TOS, 1970

8

Effective Interplay between Sparsity and Quantization: From Theory to Practice
ICLR 2025, 1970

9

Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput
HPCA 2025, 1970

10

HyMM: A Hybrid Sparse-Dense Matrix Multiplication Accelerator for GCNs
DATE 2025, 1970