EVS Speech Codec Experience In The Wild – Forget AMR-WB…

When I heard about the Enhanced Voice Service (EVS) codec for the first time a couple of years ago I was a bit skeptical. After all, the AMR-Wideband Codec just went live in a number of networks back then and its sound quality compared to the traditional AMR-Narrowband was just stunning. So I thought that EVS could not raise the bar by much. In the meantime a number of network operators have launched EVS in their VoLTE networks and when I recently made my first EVS end-to-end call I was stunned at how much better it was even compared to a good AMR-Wideband call.

If you are interested in more details have a look at these whitepapers from Nokia, Qualcomm and Ericsson. In this post I will focus on my real-live experience and the takeaways after reading these documents.

From 4 kHz In the Past (And Still Today…)

When I first started working in the telecommunication sector, voice was the main application. As far as voice codecs were concerned the PCM (Pulse Code Modulation) codec with a sampling rate of 8 kHz and a speech bandwidth of 4 kHz ruled the world in fixed line networks. I would (unfortunately) argue that it still does today as only few people seem to go out of their way to upgrade their fixed line phone equipment.

… to 8 kHz Wideband – State of the Art

In the real world things started to change ca. 2010 when the first mobile network operators went live with the AMR-Wideband codec and started selling compatible devices. Here’s one of my blog entries from back then. AMR-WB samples an analog audio signal at 16 kHz to digitize an audible bandwidth of 8 kHz. The difference in voice quality is remarkable while the data rate required to transmit an AMR-WB coded data stream has almost remained the same compared to AMR-NB (12.2 vs. 12.65 kbit/s). Some VoLTE systems use a data rate of 23 kbit/s but I have to admit that I can’t hear the difference.

Wideband in Fixed Line Networks

The most stunning experience I’ve yet had with a Wideband Speech codec was between two fixed line phones supporting the G722 wideband codec. Note that this is not AMR-WB as fixed line networks reuse the 64 kbit/s data rate used for PCM. Whether it’s the higher data rate or the hardware and software quality of the phones, or a combination of both that made the difference to WB-AMR with high end mobile phones I am not sure.

16 kHz Super-Wideband and 20 kHz Fullband

EVS now takes the next step and increases the sampling rate to 32 kbit/s for an audio bandwidth of 16 kHz. In this configuration and at a data rate of 13.3 kbit/s this is referred to as a super-wideband transmission. Note that while the data rate has slightly increased it is still very close to the original 12.2 kbit/s used by the AMR-NB codec. EVS has an optional ‘fullband’ mode that samples at 48 kHz to catch an audio bandwidth of 20 kHz but requires a higher data transmission rate. For reference, according to this Wikipedia article, 20 kHz is the limit of human perception.

EVS Super-Wideband Codec Properties

The PCM codec still used in fixed line networks is a pretty simple affair by today’s state of the art. A sample is taken every 1/8000 s and translated to an 8 bit digital value according to a translation table resulting in a data rate of 64 kbit/s. Fast forward to EVS and the world has changed entirely. Complex algorithms are now used to analyze and encode different parts of the spectrum and EVS even uses different codecs for voice and music and can switch between them every 20 milliseconds. Frame Error Concealment (FEC) algorithms have also been improved to cope with up to 10% of faulty 20 ms frames.

Better Quality or More Capacity?

All of this can be used for two purposes. I suppose most network operators will use it to further improve sound quality and use it as a competitive advantage over other network operators that have not yet introduced EVS. Another option is to reduce the data transmission rate of a voice call while maintaining AMR-WB audio quality to increase the number of simultaneous calls per cell. This might be interesting in special situations like in stadiums and during large scale events.

The OPUS Competition

EVS is highly patented so it is unlikely it will find a large supporting base beyond 3GPP mobile network operators. This is because there is OPUS, a patent-free alternative that works as well or even better than EVS at bit rates beyond 32 kbit/s as even some of the papers linked to above admit. The catch is that it performs significantly worse compared to EVS and even AMR-WB at the 12 to 13 kbit/s mobile network operators are comfortable with. Yes, it wouldn’t be a problem to use OPUS in VoLTE but overall capacity in a cell would significantly decreased.

EVS will be VoLTE-only

As mentioned in the Ericsson whitepaper, it’s likely that EVS will only be introduced in VoLTE networks and not in 2G or 3G legacy like AMR-WB. That means that when falling from LTE to 2G or 3G at the coverage edge, voice quality will deteriorate. But at least, in many networks these days, not to AMR-NB but to AMR-WB.

A Device Quality Question

Don’t expect EVS to sound great in ultra cheap phones. Even today, there is a big difference of how AMR-WB sounds with different devices as the speaker and microphone hardware as well as the software used in the audio path makes a big difference in practice. For EVS good and thus more expensive hardware will be even more crucial.

I guess not everybody will want to pay for better audio quality but I, for my part, will certainly do so!