Recently I wanted to have a closer look at how authentication works for the XCAP protocol that is used in VoLTE by mobile devices to control things such as call forwarding settings. At first I thought that the topic is so far off the beaten path that I need to have a look in the specifications right away. But I gave Google a chance and was quite positively surprised that there’s a ton of information out there that is much easier to digest than going to the specs right away.
I was a bit confused when I saw XCAP authentication for the first time because it looks overly complicated (like pretty much everything in VoLTE for that matter, but that’s a rant for another day). The reason for this is that XCAP doesn’t use a custom tailored authentication mechanism but hooks itself into the Generic Bootstrap Architecture authentication method which is described on Wikipedia here.
In practice this works as follows on a very high level as described in an interesting article by Karel in more detail: When a smartphone wants to check or set the current VoLTE call forwarding settings it assembles an XCAP HTTP request and sends it to the XCAP server. The XCAP service notices that there is no or only outdated authentication information included in the HTTP header an rejects the request with an HTTP 401 ‘unauthorized’ response. This is a standard HTTP mechanism so there’s nothing special about this part.
The special part is that the XCAP server includes information in the 401 response that the device should first perform an authentication procedure with the Generic Bootstrap server. Both the XCAP server and GBA server have standardized URLs so the device performs a DNS lookup for the IP address of the authentication server and then contacts it with another HTTP request. This HTTP request contains the user’s IMSI and is rejected again with a 401 response and a number of parameters containing authentication information. This data is then forwarded to the SIM card which generates a response that is then sent in another HTTP request to the GBA server. If the response was correct an HTTP 200 OK is returned.
Once the device is authenticated it can contact the XCAP server again, this time with the required authentication information. The XCAP server then talks to the GBA authentication server to validate the authentication information. If all checks out the request is processed and an XCAP response is generated. Often, a device will make several requests in a row, e.g. a call forwarding settings status inquiry followed a bit later by an update of the settings after the user has modified them on the user interface. In this scenario the GBA server is not contacted again as the authentication information that has been received above can be used for several interactions. I haven’t looked at the details but I suppose it is made sure in some way or other that there can be no re-use of intercepted authentication details. I’ll leave you to the specs to find out the details.
It all sounds a bit complicated and after looking at the links I’ve provided above the reason for this is quite clear: GBA is a generic authentication mechanisms not specifically designed for XCAP. As such it acts as a kind of ‘plugin’. Obviously this was never designed to be efficient as there are many round trips and TCP session establishments required before an XCAP request is fully processed.