We propose a real-time perception module for recognizing social signals relevant to emotion-aware human-robot interaction (HRI). The system integrates facial expression analysis, gaze tracking, and speech prosody recognition to estimate user emotional states. A lightweight deep network architecture enables low-latency inference on embedded systems, facilitating real-time feedback and robot behavior adjustment. User studies indicate that robots using this module can more effectively align with human emotional expectations, enhancing trust and acceptance in long-term interactions.