In this paper, we propose a tactile-visual sensor fusion method to improve robotic object grasping in cluttered and occluded scenes. By combining visual data with real-time tactile feedback, our system can infer object properties more accurately and adjust the grasp pose adaptively. A deep neural network architecture is used to process multimodal inputs, achieving robust grasping success under challenging conditions. Experimental evaluation across multiple clutter scenarios demonstrates that our approach significantly outperforms visual-only baselines in both precision and stability.