The Thirty-Ninth Annual Conference on Neural Information Processing Systems
Q-Palette: Fractional-Bit Quantizers Toward Optimal Bit Allocation for Efficient LLM Deployment
2025
2025
The Thirty-Ninth Annual Conference on Neural Information Processing Systems
Learning to Better Search with Language Models via Guided Reinforced Self-Training
2025
2025
The Thirty-Ninth Annual Conference on Neural Information Processing Systems
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
2025
2025
Forty-Second International Conference on Machine Learning
GuidedQuant: Large Language Model Quantization via Exploiting End Loss Guidance
2025
2025
The Forty-First International Conference on Machine Learning
LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging
2024
2024