Visualizing Voice Packetization – OTT Voice Observations – Part 2

Inter-packetization in ms of WebRTC voice packets of the Conversations messenger app

In part 1 on this topic, I’ve done a first high level comparison of how the Signal messenger app and the Conversations messenger app use WebRTC for real time voice calling. In that post, I showed that Signal packetizes the voice channel in 60 ms chunks while Conversations uses a packetization of 20 ms. The two graphs in this post show how this can be seen with the help of a Wireshark trace that I took on my Wi-Fi router at home.

Inter-packetization in ms of WebRTC voice packets of the Signal messenger app

To get the graphs, I filtered the Wireshark trace on the source IP address of the incoming voice data stream to only see the packets of one side of the voice channel. I then cut the beginning and the end of the call to get a clean voice channel trace without channel setup and tear down messaging and then saved the packet selection into a new file. When reopening this file in Wireshark, one can see the delta time between the voice packets by adding a “delta time” column to the message trace. Already at this point, one can see that most packets arrive in 20 or 60 ms respectively in the Signal and Conversation traces.

To better visualize the inter-packet space distribution I wrote a short Python script that reads an input Wireshark or tcpdump pcap trace and plots the delta time between packets in a graph. The graphs in this post are a direct output of the trace. If you want to give it a go yourself, you can find the source code here.

Back to the two graphs because they show two interesting things: First, it clearly shows that Signal uses a voice channel packetization interval of 60 ms while Conversations uses 20 ms. And the second thing the graphs show, if you know what you are looking for, is that Signal uses only 1/3 of the number of packets that Conversations uses for the voice channel in the same time interval. 1200 vs. 3600 packets for a one minute call in one direction. While this is more efficient it comes at the expense of 40 ms extra latency of the channel.

So far so good, perhaps not too exciting so far. But let’s up the game a bit in the next post: What happens to the voice channel during a handover in a cellular network?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.