3GPP Release 5: HSDPA and the IP Multimedia Subsystem
Release 5 made two enormous and lasting contributions: HSDPA turned 3G into a genuine mobile broadband platform by moving scheduling intelligence into the base station, and IMS created an entirely new SIP-based session layer that every carrier VoLTE deployment in the world still relies on today.
Overview โ Two Contributions, One Release
When Rel-5 was frozen in 2002, UMTS was commercially live in several countries but struggling to compete with fixed broadband on data rates. Basic UMTS offered 384 kbps in typical conditions โ usable for email and simple browsing, but not enough for video or serious data applications. At the same time, operators needed a future-proof architecture for delivering voice, video, and messaging all over the same IP network.
Rel-5 addressed both problems. HSDPA (High Speed Downlink Packet Access) rewired the scheduling model of the UMTS downlink, delivering a tenfold increase in peak throughput without replacing the radio access infrastructure. IMS (IP Multimedia Subsystem) introduced a SIP-based session control layer that sits between applications and the packet core โ a reusable, standardised framework for all real-time multimedia services.
HSDPA โ What Changed on the Downlink
In basic UMTS (Rel-99), the RNC (Radio Network Controller) made all scheduling decisions for the downlink. The Node B was a relatively passive transmitter: it received data and instructions from the RNC and sent them to the UE. This design meant that the round-trip time for a scheduling decision โ UE reports channel quality, RNC decides what to send, Node B transmits โ could span many tens of milliseconds just due to the Iub transport delay between Node B and RNC.
HSDPA moved scheduling intelligence into the Node B itself. The Node B can now observe channel conditions and make scheduling decisions within a single 2 ms TTI (Transmission Time Interval), down from the 10โ20 ms TTI used by basic UMTS dedicated channels. This is the fundamental reason HSDPA is faster: the feedback loop between measuring the channel and adapting to it is ten times tighter.
The new radio channel introduced for HSDPA is the HS-DSCH (High Speed Downlink Shared Channel). Unlike dedicated channels that belonged exclusively to one user for the duration of a connection, the HS-DSCH is shared dynamically among all HSDPA users in a cell. In each 2 ms TTI, the Node B scheduler selects which user (or users) to serve, how much power to allocate, and what modulation and coding scheme to use โ all based on current channel quality reports from the UEs.
A single downlink channel shared among all HSDPA users in the cell, time-multiplexed in 2 ms slots. The Node B scheduler allocates the full channel to the best-placed user in each slot, maximising overall cell throughput. Dedicated channels that reserved capacity per user are replaced by this dynamic, opportunistic approach.
Before each HS-DSCH transmission, the Node B sends a HS-SCCH (High Speed Shared Control Channel) burst identifying which UE is about to receive data and what modulation/coding scheme to expect. The UE reads this one slot before the data arrives, giving it just enough time to set up its receiver for the incoming transmission.
AMC and HARQ โ The Two Techniques Behind the Speed
HSDPA's performance comes from combining two adaptive techniques that work together on every 2 ms transmission:
Adaptive Modulation and Coding (AMC)
In basic UMTS, a fixed spreading factor determined the data rate for a user, and the RNC adjusted that factor slowly. With AMC, the Node B selects a modulation scheme and coding rate for each 2 ms slot based on a CQI (Channel Quality Indicator) report that the UE sends on its uplink control channel (HS-DPCCH) every TTI:
- QPSK (2 bits/symbol) โ used when signal quality is poor; more redundancy in the coding rate protects against errors at the cost of lower throughput
- 16-QAM (4 bits/symbol) โ used when signal quality is good; packs twice as many bits into each symbol, delivering higher throughput when the channel can support it
The CQI-to-modulation mapping happens autonomously in the Node B, without any involvement from the RNC, on every single 2 ms slot. A user moving from a strong signal area to a weaker one will have their modulation downgraded seamlessly within a few milliseconds.
HARQ โ Hybrid Automatic Repeat Request
In basic UMTS, if a packet was received with errors, the Node B discarded it and requested a full retransmission from the RNC โ a slow and wasteful process. HSDPA introduces HARQ at the Node B level:
- The Node B stores a soft copy of every received transmission, including failed ones
- When a retransmission arrives, the Node B combines it with the stored copy at the physical layer โ each attempt adds useful signal energy that the receiver can exploit
- This means that even a transmission that individually had too many errors to decode correctly contributes positively when combined with subsequent attempts
- The NACK and retransmission exchange happens between the UE and the Node B in approximately 8 ms โ far faster than an RNC-level retransmission
Together, AMC and HARQ deliver a theoretical peak of 14.4 Mbps on the downlink for a Category 12 device โ roughly ten times the 1.8 Mbps achievable with basic UMTS in comparable conditions.
IMS Architecture โ SIP as the Universal Session Layer
The IP Multimedia Subsystem (IMS) is a SIP-based session control framework that sits between the user device and any application server โ voice, video, presence, instant messaging, or anything else that involves a real-time session. It does not carry media itself: it sets up, manages, and tears down sessions, leaving the actual RTP media streams to flow directly between endpoints or through dedicated media resources.
IMS is built around three types of Call Session Control Function (CSCF), each with a distinct role:
The P-CSCF is the UE's first point of contact in the IMS network. All SIP signalling from the UE flows through it. The P-CSCF enforces QoS policies โ when a SIP INVITE is sent for a voice call, the P-CSCF triggers the reservation of the correct QoS bearer in the packet core for the resulting media stream. It also compresses SIP headers (using SigComp) to reduce signalling overhead on the radio link.
The I-CSCF is the public-facing entry point into a home network's IMS core. When a call or registration arrives from another network (or from the P-CSCF for an initial registration), the I-CSCF queries the HSS to find out which S-CSCF should serve this subscriber. It then forwards the SIP message to that S-CSCF. The I-CSCF hides the internal topology of the IMS network from external networks.
The S-CSCF is where the real work happens. It registers users, maintains registration state, routes all SIP sessions, and applies service logic by triggering Application Servers based on filter criteria downloaded from the HSS. Every SIP request and response for a registered subscriber flows through their assigned S-CSCF. One S-CSCF may serve tens of thousands of subscribers simultaneously.
The HSS is the master database of the IMS core, combining the roles of the HLR and AuC from the CS world. It stores subscriber profiles (what services they have), service trigger rules (which Application Servers to invoke for which events), authentication credentials, and the current S-CSCF assignment. The HSS communicates with the CSCFs using the Diameter protocol over the Cx interface.
What IMS Enables
Because IMS uses standard SIP, any real-time communication service โ voice, video, presence, instant messaging, Push-to-Talk, conferencing โ is just another SIP session. The same core infrastructure handles all of them. Adding a new service means deploying a new Application Server and configuring service triggers in the HSS; no changes are needed to the CSCFs or the transport network.
This is why IMS became the architecture for VoLTE (Voice over LTE):
- The UE registers with the IMS core via the LTE packet network, using the P-CSCF as its SIP proxy
- When a voice call is made, the P-CSCF triggers a dedicated QoS bearer in the LTE core (an EPS bearer with the correct QoS Class Identifier for voice)
- The S-CSCF routes the SIP session and the media flows as RTP over that QoS bearer
- The call sounds better than legacy 3G calls because AMR-WB (wideband AMR) is used, operating at up to 23.85 kbps with twice the audio bandwidth of narrowband AMR
Every mobile operator that offers VoLTE, VoNR (Voice over NR), or Wi-Fi Calling is running an IMS core that is architecturally identical to what Rel-5 defined in 2002.
Why Rel-5 Mattered
- HSDPA turned UMTS into real mobile broadband โ 14.4 Mbps peak with AMC and HARQ made video streaming and rapid file downloads practical on a cellular connection for the first time
- Node B scheduling was a paradigm shift โ moving scheduler intelligence to the base station is a design principle that LTE (eNodeB scheduler) and 5G NR both inherited directly; HSDPA proved it worked
- IMS is still the architecture underpinning every VoLTE and VoNR deployment worldwide โ more than 20 years after Rel-5, the P-CSCF/I-CSCF/S-CSCF/HSS architecture is in active production across every major mobile network
- SIP as the common session language โ IMS's use of standard SIP meant that application developers could build multimedia services without understanding the underlying mobile network infrastructure
- The combination defined what 3G smartphones needed โ fast data plus a standardised rich multimedia session layer created the technical foundation for the mobile internet era that followed
