Recent advances in diffusion models (DMs)—such as few-step denoising and multi-modal conditioning—have significantly improved computational efficiency and functional flexibility, but they also introduce new hardware challenges. In particular, the elimination of inter-timestep redundancy, increased encoder/decoder workload, and heightened sensitivity to quantization demand a new class of accelerator. We present EdgeDiff, the first processor to support end-to-end, few-step, and multi-modal DM inference. EdgeDiff introduces a unified solution named condition-aware reordered group mixed precision (CRMP) with several novel microarchitectures: compress-and-add (CAA) processing elements (PEs) with bit-shuffle trees (BSTs) for efficient low-bit multiply-accumulate (MAC), a tiered accumulation unit (TAU) to reduce floating-point (FP) accumulation energy, and a grid-based quantization unit (GQU) to eliminate expensive FP division. Fabricated in 28-nm CMOS, EdgeDiff achieves up to 34.4-TOPS/W energy efficiency and reduces generation energy to 418.4 mJ/image for one-step text-to-image (T2I) generation—<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX"></tex-math> </inline-formula> lower than prior state of the art. Despite aggressive quantization, EdgeDiff maintains output quality comparable to FP inference across Fréchet Inception Distance (FID), contrastive language–image pretraining (CLIP), and peak signal-to-noise ratio (PSNR) metrics, establishing it as a compelling solution for energy-efficient, real-time generative artificial intelligence (AI) on edge platforms.