In the good old fixed line SIP world, the originator of a speech call told the other side which speech codecs it supported. The other side then picked a suitable codec and informed the originator about the choice. That was it and things were ready to go. In VoLTE, you might have guessed, it’s not quite as simple. Today, codecs are rate adaptive and bandwidth for the data stream can be limited by the mobile network to a value that is lower than the highest data rate of a codec family.
Codecs – Wide and Narrow
Let’s have a look at codecs first. In VoLTE the two most used codecs are narrowband Adaptive Multi-Rate (NB-AMR) and wideband AMR (WB-AMR). Between VoLTE capable devices, WB-AMR is typically chosen as it offers a much better sound quality than the traditional narrowband codec. Many circuit switched 3G and even some 2G networks and mobile devices also support WB-AMR today so it can be used to ‘legacy’ networks and devices as well.
Some network operators also support a wideband speech codec in their fixed line IP based networks so WB-AMR is the codec of choice, too. In this particular scenario, however, a media gateway is required at the border of the network as the wideband coded used in fixed line networks (G.722) is different and more bandwidth intensive (64 kbit/s) than WB-AMR (G.722.2) which typically runs at 12.65 kbit/s.
If WB-AMR is not supported by one party, NB-AMR is chosen. From a bandwidth point of view there’s almost no difference as NB-AMR is also used with a data rate of 12.2 kbit/s.
Adaptive Codecs
Unlike in the old days, codecs are rate adaptive today. WB-AMR, for example, can encode speech at a rate between 6.6 to 23.85 kbit/s. At the lower end, sound quality is rather limited while encoding a speech signal sampled at 16.000 Hz at 23.85 kbit/s gives a superb result. In practice, most network operators chose to limit WB-AMR to 12.65 kbit/s as it seems there is little gained in terms of speech quality beyond that data rate. Another reason for using 12.65 kbit/s is that 2G and 3G circuit switched networks also limit WB-AMR to this data rate as it fits into the original NB-AMR channels. Having said, there is nothing that keeps network operators from allowing the WB-AMR coder on the devices to use the full 23.85 kbit/s.
The idea behind data rates lower than 12.65 kbit/s is that in case network coverage gets weak, more speech packets might make it to the other side in time. Which data rate is used in the end is decided by the speech coder in the mobile device. The codec rate can be changed every 20 ms and each speech frame encapsulated in an IP packet contains a header that informs the receiver at which data rate this speech packet was encoded with. The following screenshot shows how the data rate is signaled in an IP/UDP/RTP speech packet.
How Codecs Are Selected
While each device in a call has the freedom to change the codec’s data rate whenever it wants during a call, the type of codec is negotiated only once during call setup. This is done in two steps. In the first step the originator of the call includes information about supported codecs in the SIP INVITE message. At the end of this message all supported codecs are listed in the Session Description Protocol (SDP) part. For details see RFC 4566. Here’s an abbreviated example:
m=audio 42888 RTP/AVP 116 118 a=rtpmap:116 AMR-WB/16000/1 a=fmtp:116 mode-change-capability=2;max-red=0 a=rtpmap:118 AMR/8000/1 a=fmtp:118 mode-change-capability=2;max-red=0
The first line (m=) says that the device supports two types of media streams which will be described in the lines that follow. For easier identification the device has given them the IDs 116 and 118. In practice a device can support more than just two codecs and I’ve removed the telephony events from the list as well because that’s a separate topic. One other noteworthy piece of information given in the first line is the local UDP port number (42888) to which the incoming audio stream should be sent later on. The ‘a’ lines that follow then describe the codecs behind IDs 116 and 118 which are AMR-WB and AMR-NB in this example. The other side of the connection then selects one of the two codecs and informs the originator which one it has chosen in a 183 SESSION PROGRESS message.
So far so good. The next aspect to have a look at is how devices and/or the network can limit the maximum bandwidth, e.g. to 12.65 kbit/s despite the AMR-WB codec supporting data rates of up 23 kbit/s. I’ll describe that in the next VoLTE post, so stay tuned.