This paper presents a novel self-supervised representation learning framework that leverages multi-stage haptic feedback for robotic interaction. By extracting structured features from raw tactile signals without manual labels, our method builds compact and generalizable representations suitable for downstream tasks such as grasping and texture classification. The framework includes a stage-wise encoder architecture that progressively refines tactile information, improving the learning robustness. We validate our approach on a set of diverse object interaction tasks, demonstrating its superiority over supervised and contrastive baselines in both simulation and real-world settings.