This article presents design techniques for an energy-efficient multi-lane receiver (RX) with baud-rate clock and data recovery (CDR), which is essential for high-throughput low-latency communication in high-performance computing systems. The proposed low-power global clock distribution not only significantly reduces power consumption across multi-lane RXs but is capable of compensating for the frequency offset without any phase interpolators (PIs). To this end, a fractional divider (FDIV) controlled by CDR is placed close to the global phase locked loop. Moreover, in order to address the suboptimal lock point of conventional baud-rate phase detectors, the proposed CDR employs a background eye-climbing algorithm (ECA), which optimizes the sampling phase and maximizes the vertical eye margin (VEM). Fabricated in a 28 nm CMOS process, the proposed <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX"> </tex-math></inline-formula> Gb/s RX shows a low integrated fractional spur of −40.4 dBc at a 2500 ppm frequency offset. Furthermore, it improves bit-error-rate (BER) performance by increasing the VEM by 26 mV. The entire RX achieves the energy efficiency of 1.8 pJ/bit with the aggregate data rate of 128 Gb/s.