Now that Conversations supports end-to-end encrypted voice calls I have come to use it on a daily basis. While for text and image messaging I don’t mind a delay of a few seconds it is crucial that the setup of a voice call happens as quickly as possible. It turns out that in practice quite a number of things can delay message exchange and things can be optimized. In a previous post I went into the details of configuring TCP timeout of the XMPP server to counter carrier NAT gateways that have a very low NAT timeout value and cut connections before the default TCP or application keep alive mechanism sends an IP packet to keep the connection open. Once that was fixed, call establishment worked fine while the device in question was using the cellular mobile network. However, I still suffer some delay or even call setup failures when devices are connected to a particular Wifi network so I had a look at that scenario as well.
While the XMPP server and the Wifi access point are in different networks that are connected over the Internet, they are both mine so I could trace both sides to see where the packets get stuck. And to my great surprise the occasional delayed voice call setups seem to be caused by the Wifi link.
The screenshot at the top of this post shows what is going on when a voice call is not connected straight away. The left side shows how the XMPP server sends a message to the client device but receives no immediate answer. The TCP stack then sends a number of TCP retransmissions which are finally answered 10 seconds later!
The picture on the left shows the same communication in the target network. The packet comes in and the router sends an ARP message to find the mobile’s MAC address to which it would then forward the TCP packet. The problem is that there is no answer to the ARP request. In other words, the Wifi access point either does not forward this broadcast packet to the client correctly or the client device fails to pick it up correctly from the broadcast queue. In most cases, the device answers ARP requests just fine but every now and then it takes many attempts before the device returns an answer. At this point in time I suppose there is something wrong with the Wifi power save implementation. As it is happening with more than one device this might be a Wifi access point issue rather than a device issue. It also seems other Wifi access points don’t have this problem, but I am not sure at this point in time.
A story to be continued.