Which Voice Codec? OTT Voice Observations – Part 3

In part 1 and 2 on this topic, I’ve been looking at how over the top voice services such as the Signal messenger or the Conversations messenger packetize and send data over the cellular network and the resulting graphs look pretty neat. But which codec is actually used for the voice channel? I’ve been looking a bit into the source code of both messengers and both use WebRTC for the voice channel. But which voice codec is actually negotiated? The Internet ‘knows’ that Opus is used, but is that really so? I decided to have a closer look.

In the case of Conversations, I operate the XMPP server and the TURN server in the network myself. So I have access to the signaling messages on the XMPP side. While message and speech call content is encrypted, the voice channel establishment messages are not, so it is possible to trace and decode them on the server. More about how I did that in a follow up post. The screenshot above (right click + open in new tab for full resolution) shows the voice channel establishment message. And there we go, plain out in the open: Opus is the preferred codec!

Other interesting things in the message is the information how to contact the STUN/TURN server to find out if one is behind a NAT, and keying material required for the voice channel. The fingerprint keying material is end to end OMEMO encrypted, so a man in the middle attack on speech channel encryption is not possible.