LTE Congestion Behavior – Part 2

Recently, I attended a huge fair with 10.000 people crammed into a 16.000 m2 hall that had indoor LTE coverage from 3 network operators and little Wifi for public use. From a mobile network point of view it doesn’t get much ‘worse’ than this so I spent some time to have a look how one of the networks was coping with such a huge number of people and their devices.

Data Throughput Numbers

As one would expect, the network in the hall was fully loaded. When running speed tests I could only get a sustained data rate of around 1-3 Mbit/s whenever I tried. And this despite two 20 MHz carriers being used in the hall. So I traversed the hall to see how many cells were actually on air and found four 2×2 MIMO antennas spaced evenly apart, and each pair was used for one cell with one band 3 and one band 7 carrier. In other words, 2 cells were on air of one of the operators with 40 MHz of spectrum. There was GSM and UMTS coverage as well but I disregard that for the remainder of the post as pretty much all devices were using the LTE network and I suppose most of them used VoLTE for telephony rather than falling back to the 2G or 3G network for circuit switched voice service.

So for on device browsing, emailing, texting, etc. this was still good enough and the delay to get a web page displayed on the smartphone was still acceptable but noticeably longer than outside. However, larger data transfers would have been a pain.

Number of Simultaneous Users and RRC Connections

The 1-3 Mbit/s is obviously very little compared to the 120 Mbit/s I could achieve in the hall earlier in the day after I arrived when only few people were present. Let’s assume that despite the high signaling overhead to which I will come to in a minute, the cell was still able to use 80% of this value for data traffic, i.e. around 100 Mbit/s. As I got a sustained 2 Mbit/s out of the channel, 98 Mbit/s was used for other devices. But just how many? I guess this is difficult to answer from the outside but I’ll try to get to a ballpark number from two different directions.

First approach: Let’s say every other device that was transferring data at the time got the same amount of resources. That’s probably a gross simplification for many reasons, including some devices have better radio conditions than others, but I’m only looking at ballpark figures here. So in this simplistic model, 98 Mbit/s divided by 2 Mbit/s results in 49 other ‘quasi’-simultaneous data streams. As two cells were in the hall, that would be 100 devices getting data in the downlink direction at 2 Mbit/s simultaneously.

Second approach: Most people today have tons of apps on their smartphones today and they keep chatting and keeping their TCP connections alive constantly. When I look in my Wifi network at home how often a normal smartphone sends and receives IP packets, there is hardly a 60 seconds interval in between, even if the screen is off. For an LTE network that is configured to go to idle 15 seconds after the last packet was exchanged for a while, this means that for every 1 minute interval, a device is in RRC Connected State for 15 seconds. That’s 25% of the time. And a lot of people in the room are actually using their devices, in which case the RRC connection hardly goes to idle at all. So let’s say that’s another 20%. With 40% of 10.000 people in the hall and each of them having one device with them, 4.000 devices are managed by 2 cells and 45% of them are RRC Connected simultaneously, i.e. there are 1800 simultaneous RRC connections, or 900 per cell!

The first approach gives 100 simultaneous devices, the second approach results in 900 devices. Now that is almost an order of a magnitude difference. The difference stems from the fact that the first approach assumed that all devices are transferring data all the time while they are RRC connected while the second approach assumes that most devices just have a small burst of data to keep TCP connections open and perhaps a bit of additional data to download a text message, a web page or other ‘small’ things. Being RRC Connected, however, doesn’t mean to have continuous data transfer so the number of devices that have data in the eNodeB buffer waiting to be sent in the downlink direction must somewhere in between the two numbers.

Scheduling Behavior

At the high point when the hall was really packed, I could notice that the behavior of the eNodeB significantly changed. Earlier in the day with fewer people present, the network always configured Carrier Aggregation of band 3 and 7 for me whenever I looked. Later, when throughput was down to 1-3 Mbit/s, I never saw CA again. My device was either on the band 3 layer or on the band 7 layer without CA configured. That makes me wonder why!? One possible explanation I could come up with is that in such high load scenarios, CA makes little sense anyway as it would not increase the data throughput of a single device. Also, by not using CA, the eNodeB has two schedulers available instead of only one and can thus load balance the RRC Connections between the two schedulers and queues instead of only having one queue for 900 simultaneous connections. That would make sense to me.

Furthermore, I could observe that the eNodeB seems to send data more quickly in the first few seconds of a data transfer but slow down if there is a sustained data flow to a device. I already observed this at another occasion but I haven’t had the time to analyze this more closely.

A Few Words About the Uplink

So far I only wrote about the downlink behavior, but what about the uplink? Whenever I tried I could get sustained data rates of 20-30 Mbit/s, even if I could only receive at 1-3 Mbit/s. In other words, even at very high load, there is very little being sent from devices to servers in the network. I would not have expected that.

Total Capacity and Outlook

So if each cell could supply a bandwidth of 100 Mbit/s, one network delivered 200 Mbit/s in the exhibition hall. If the other two networks that shared the antenna system were equally capable, that would be a combined capacity of 600 Mbit/s in the hall. Again, I disregard 2G and 3G as their additional capacity is negligible. 600 Mbit/s for 10.000 people and it was saturated. That is quite something. And one of the lessons that should be learned is that this capacity will not be enough anymore in the near future, as the amount of data people exchange over cellular networks keeps rising. 5G can help because it could add up to 100 MHz of additional spectrum per network operator in the hall, thus tripling the available bandwidth. That’s not an easy thing, however, as band n78 (3.5 GHz) is unlikely to be supported by today’s coaxial mixing and distribution systems that combine the signals of all network operators so the same antennas can be used in a hall. In addition, the 5G air interface works best with active antenna systems which can’t be shared among network operators, and your are looking completely redesigning and re-building indoor coverage for such exceptional usage scenarios. Another option would be to distribute more cells in the hall with lower power. That will also significantly increase overall bandwidth but, of course, also requires a redesign.

Interesting times ahead!

2 thoughts on “LTE Congestion Behavior – Part 2”

  1. I often have to opportunity to be in a large gatherings (say 1000+) of users who are probably all on the same network. It still surprises me how well the network copes with the traffic (admittedly using carriers over 4 bands).

    Or to put it another way, in reality, peak traffic speeds as headlined by the various benchmarks don’t really mean much to the average user in normal usage, as “minimal” throughput still delivers a usable experience.

  2. Your interesting analysis clearly shows the importance of priority service differentiation for blue light emergency response. I have considered the implications of poor downlink performance during a post-attack emergency, but your findings suggest even day-to-day blue light data usage may be difficult as citizens naturally flow more traffic thanks to auto-start videos.

    As more wireless broadband capabilities become part of delivering EMS, law enforcement, and fire service incident support, the networks will need to prioritize the traffic. But except for EE’s support for the UK Emergency Services Network program, MNOs in Europe do not offer support for mission-critical QCIs and access class barring. It will be interesting to see how this picture changes considering the lurking uncertainty presented by net neutrality restrictions that may discourage MNOs from better serving the needs of blue light services.

Comments are closed.