In cellular networks, edge intelligence is often enabled by MultiAccess Edge Computing (MEC), which aims to bring AI services closer to end users. Although MEC reduces latency by placing computation near the network edge, it remains external to the Radio Access Network (RAN) and its native execution environment, thereby introducing additional transport and buffering delays. The emerging AI-on-RAN paradigm proposes to overcome these limitations but remains largely conceptual, lacking practical implementation and feasibility validation. In this paper, we present AoRA, the first framework realizing the AI-on-RAN vision by dynamically utilizing available computational headroom in GPU- and NPU-accelerated RAN platforms to deliver AI services directly from within the base stations. AoRA leverages containerized AI workloads in the 5G RAN stack to enable inlined and opportunistic AI service provisioning without degrading core telecom operations. The framework is fully compliant with O-RAN interfaces and can operate seamlessly alongside existing edge computing infrastructures. Evaluations show that AoRA reduces transport latency by over 30% compared to MEC and 70% compared to cloud-based setups.