Occupancy prediction is a critical task in autonomous driving, enabling better understanding of 3D environments for downstream tasks. Previous methods often rely on dense back-projection methods to extract 3D features from 2D images by distributing information across all voxels. While effective, these approaches are computationally expensive and inefficient due to the dense nature of 3D voxel representations. Inspired by recent works, we address this challenge by instance-level attention that utilizes representative queries for groups of voxels, reducing computational cost while maintaining competitive performance. By applying attention mechanisms to this instance-level representation, we achieve an mIoU score of 35.26 with a latency of 0.04s on the <tex xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"></tex>cc3D dataset in RTX4090. These results demonstrate that focusing on instance-level representations provides an efficient and practical solution for real-time occupancy prediction tasks.