This work presents a self-supervised learning approach for developing adaptive robot behavior without relying on labeled data. By structuring environmental interactions into latent representations, robots can learn behavior policies that generalize across tasks and scenarios. Our framework utilizes sensor-rich interaction logs to train encoders that capture semantics from raw multimodal data. Experimental results highlight the framework’s effectiveness in low-data and cross-domain settings, offering a scalable path toward lifelong robot learning with minimal human intervention.