While camera–radar fusion has led to notable progress in autonomous driving, many existing approaches overlook the risk of sensor failures, which can critically compromise system safety. To address this limitation, we propose RoCaRS, a robust camera–radar fusion model designed for bird’s-eye view (BEV) segmentation under sensor failure scenarios. RoCaRS incorporates two key components—Radar-aware Backbone (RB) and Feature Spreading (FS)—to enhance BEV feature representation, along with a Dynamic Input Dropout Strategy (DIDS) and Bidirectional Feature Refinement (BFR) to address missing sensor inputs. Experiments on the nuScenes benchmark show that RoCaRS not only outperforms state-of-the-art fusion models under normal conditions but also maintains high performance under various sensor failure settings. Notably, in the complete absence of camera input, RoCaRS exceeds the baseline by +23.2 mIoU for map and +30.0 IoU for vehicle. Furthermore, it retains 99% of the radar-only model’s performance and achieves 103% of the camera-only model’s performance when either all cameras or all radars are disabled—without any retraining. These results highlight the potential of intermediate fusion to match the robustness of late fusion, while more effectively leveraging complementary modalities.