This paper proposes an explainable multi-modal deep learning framework for early prediction of signal integrity (SI), specifically eye height (EH) and eye width (EW). The model fuses heterogeneous measurement data across hierarchical stages (module, package, and system) to anticipate final eye margin outcomes before full system integration. The application of Shapley additive explanations and integrated gradients converts the model into an explainable form, which clarifies the leading contributors to EH and EW margins. The framework, evaluated on 1a-nm 16Gb DDR5 DRAM, achieves approximately 89% accuracy in predicting EH and 66.5% for EW. These results demonstrate an effective screening tool to detect potential SI margin issues in early stages, reducing the burden of exhaustive system-level validation.