Deep Inside the Network, Episode 2: AMR-WB – Skype-like Audio Quality for Mobile Networks

Ever since Skype has entered the VoIP sphere, I prefer using it over other VoIP or conventional phone networks for a
single reason: For Skype to Skype calls, voice quality is much better than what
other systems offer today. The secret is
a new voice codec which is much better than the standard G.711 PCM (Pulse Code
Modulation) codec which was invented several decades ago. For mobile networks a
new codec called Adaptive Multi Rate Wideband (AMR-WB) will do a similar thing
in the not too distant future.

Higher Sampling Rate

The
difference of AMR-WB compared to G.711 used in mobile core networks and fixed
line phone networks around the world, and the Enhanced Full Rate (EFR) and conventional
AMR coders used in the access part of mobile networks today is the much higher
sampling rate. While G.711, EFR and AMR use a sampling rate of 8 kHz to
digitize an audio signal in the range of 200 to 3400 Hz, AMR-WB uses an overall
sampling rate of 16 kHz to include audible frequencies between 50 Hz and 7000
Hz. In practice, this doubles the frequency range that is digitized and also
includes lower frequencies than before which are also very important for a
natural voice reproduction at the receiver side.

Standards and Interoperability

Initially,
AMR-WB was standardized by 3GPP and an overview with references to
detailed standards documents can be found in TS 26.171 [1]. Later on, the ITU-T
also adopted the codec in its G.722.2 specification [2]. This could one day
lead to the adoption of this codec for fixed line networks as well. In
addition, a new wideband speech codec was also specified by the 3GPP2 for the
CDMA world which shares some of the AMR-WB modes. Therefore, if both originator
and terminator of a voice call as well as the originating and terminating
network support AMR-WB it is possible to establish wideband speech calls across
network boundaries.

Variable Bit Rates and Audio Quality

The AMR-WB
standard specifies 9 different codec rates. These are 6.6 kbit/s, 8.85, 12.65,
14.25, 15.85, 18.25, 19.85, 23.05 and 23.85 kbit/s. For circuit switched GSM
and UMTS connections, only the first three codecs are
used. In UMTS networks, it is also possible to use the 15.85 kbit/s codec.
According to [3] and [4], AMR-WB offers superior audio quality to AMR starting
with the 12.65 kbit/s codec while the 6.6 and 8.85 kbit/s should only be used
during bad radio conditions. EFR and AMR use similar bandwidths in the radio
network. Consequently, AMR-WB offers better audio quality with the same
bandwidth requirements. This is important for backwards compatibility as will
be shown below.

Codec Introduction

AMR-WB uses
the ACELP (Algebraic Code Excitation Linear Predication) codec which is also
used by EFR and AMR. The frequency band of 50 – 7000 Hz is split in to parts in
order to achieve the best possible compression. The main frequency band covers
50-6400 Hz. As it is narrower, an internal sampling rate of only 12.8 kHz is
required. The band between 6400 and 7000 Hz is treated separately. Also, fewer
sample bits are assigned to the higher band as the lower band is more
important. For lower bit rates only the main frequency band is transmitted and
the receiver synthesizes the higher band out of the information of the main
band. At a frame duration of 20 ms, the sampling frequency of 12.8 kHz produces
256 samples (12800 Hz * 0.02 s = 256). This is convenient as this allows
efficient software and hardware implementation of bit level operations. For
silence periods during the conversation, AMR-WB also supports Voice Activity
Detection (VAD) and Discontinuous Transmission (DTX) which are also used by EFR
and AMR. During times of no voice activity, Silence Descriptor (SID) frames are
sent only once every 160 milliseconds with which the receiver can recreate the
sender’s background noise in order to avoid a “dead channel” . This
results in an average bandwidth requirement of only 1.75 kbit/s. A detailed
codec description can be found in [4].

Network Issues: Getting Rid of Tandems and
Transcoders

The main
issue with the introduction of AMR-WB in operational networks are the
transcoding units. These are used to convert the EFR or AMR speech codecs used
in the radio network to the standard 64 kbit/s PCM codec which uses a sampling rate of 8 kHz per second. Therefore, it is necessary to
enhance networks as well so that they detect that both sender and receiver are
AMR-WB capable. In such a case, transcoders in the network have to be
deactivated to create a transparent connection. This is called Transcoder Free
Operation (TrFO) in UMTS and Tandem Free Operation (TFO) in GSM. For details on
TrFO and TFO take a look at this blog entry.

Availability

Even though
AMR-WB has already been standardized a couple of years now, it’s not out there yet. There are some indications though that
things are moving forward, like for example a recent AMR-WB test of T-Mobile Germany and
Ericsson in Germany.
Take a look at the press report here.

References

[1] 3GPP TS
26.171, Adaptive Multi-Rate – Wideband (AMR-WB) speech codec; General
description

[2] ITU-T G.722.2

[3] Pasi
Ojala, et al., “The Adaptive Multirate Wideband Speech Codec: System
Characteristics, Quality Advances, and Deployment Strategies”, IEEE
Communications Magazine, May 2006

[4] B.
Bessette et al., “The AMR-WB codec”, IEEE transactions, vol. 10, no. 8,
November 2002