Raspi On Battery

Battery-piWith my 7th Raspberry Pi I finally ventured into mobile space so to speak. While all my other applications so far were of stationary nature and power sockets were always close by, I had to use a mobile power source, i.e. a battery this time to keep the Pi running while it was in my backpack. Powering a Raspi from a battery that I normally use for recharging phones while on the move is straight forward as Raspberries use the same USB connector for power as most smartphones. No extras required, I just connected the 3000 mAh battery into the Raspi and it kept it running for around 4 hours. I've put a picture of the setup on the left of this post so you can get an idea of the battery size required to keep the Pi running for a couple of hours.

SPDY in the Wild

Spdy-in-useSo far I assumed that the SPDY protocol, a more efficient version of HTTP, is still in some sort of experimental state but not widely used. Therefore I was very surprised when I recently saw it being used in a Google search request. For those of you wondering how I found out, take a look at the screenshot at the left. During TLS authentication, the servers sends the optional 'Next Protocol Negotiation' information element in the TLS 'Server Hello' message. As Firefox also supports SPDY, the communication then continues using this protocol. I couldn't observe this directly as everything is done inside the TLS encrypted traffic flow. However, there's only a single TCP connection to the server which is a pretty good indication that SPDY is used. Also, the Wikipedia entry on SPDY notes that there are quite a number of popular services in addition to Google that have also activated support for the protocol. How interesting!

90 deg = +60 Mbit/s

Wi-fi-routerThis is perhaps a bit of an odd title for a blog post but it pretty much describes an interesting phenomenon I recently discovered. Believe it or not, but I've been using an old WRT54 802.11g based Wi-Fi router at home simply because of the fact that it was stable compared to my DSL router's built in Wi-Fi and from a performance point of view it was not much slower. In practice I got around 20 Mbit/s out of that setup which was still o.k. for most uses but slower than my VDSL line at 25 Mbit/s in downlink and 5 Mbit/s in uplink. I therefore decided recently that it was time to get a well performing 802.11n router to be at last able to transfer things at line speed.

So I said bye bye to my Linksys WRT54 and hello to a Netgear WRT-3700 that I still had as a backup for the connectivity solution with bandwidth shaping for larger scale meetings. I was also thinking about buying a 802.11ac access point but my notebook only has an 802.11n Wi-Fi card so it wouldn't be worth the while for the moment. My first attempts to get to the full 25 Mbit/s my VDSL line offers over Wi-Fi were quite disappointing. Even thought the Wi-Fi access point has an 802.11n interface, I still couldn't get far beyond 20 Mbit/s I also got with the old 802.11g equipment. So I started experimenting with the orientation of the router and used the optional stand to turn the router 90 degrees around as shown in the image on the left. And suddenly I could reach a sustained throughput of 80 Mbit/s. By just turning the router by 90 degrees. So much for directional antenna output…

How Much Does It Cost To Deliver A Podcast To 100.000 Listeners?

One of the podcasts I listen to every week recently crossed the 100.000 regular listener mark. Quite a number and it made me think how much it costs to deliver that podcast four times a month to 100.000 recipients!?

Let's do the maths: Let's say the podcast audio file has a size of 50 MB. Four times a month makes that 0.2 GB x 100.000 listeners, so the total volume to deliver is 20.000 GB or 20 TB. Now how much does it cost to deliver 20 TB of data? Large cloud providers have special services for content delivery and Amazon's CloudFront service, to make a practical example, charges 0.10 cents per GB on average for that kind of data volume. In other words, delivering 20.000 GB per month costs $2000 a month plus the cost of the virtual or physical web server behind the CloudFront service. When you look at it from a listener's point of view, it costs half a cent to deliver a single episode or two cents a month per listener.

Interesting numbers! One thing I didn't factor in is that the podcast is also available as videocast in HD as I have no idea what their ratio is between those who listen vs. those who watch the video stream.

Update: A reader sent me an email today with a link to an alternative hosting provider with quite different prices. They are asking for €2 per TB, i.e. €40 = $53 instead of the $2000 calculated above when using Amazon's CloudFront service. I'm a bit baffled, that's quite a difference.

A NFV (Network Function Virtualization) and SDN Primer

I'm getting quite a number of questions lately on what Network Function Virtualization (NFV) and Software-Defined Networking (SDN) are all about. Whenever I try to give an elevator pitch style answer, however, I can often see that lots of question marks remain. I guess the problem is that people who haven't seen virtualization in practice so far have difficulties imagining the concept. So I've decided to explain NFV and SDN from a different angle, starting with virtualization on the desktop or notebook PC, i.e. at home, which is what most people can try out themselves, imagine or at least relate to. Once the concept of virtualization at home becomes clear, the next step is to have a look why and how virtualization is used in data centers in the cloud today. From here it's only a tiny step to NFV. And finally, I'll explain how SDN fits into the picture. This is probably one of the longest posts I ever had, so bring some time. As part of the exercise, I'll cover things like Software-Defined Networks (SDNs), Network Function Virtualization (NFV), OpenStack, OpenFlow, the OpenNetwork Foundation, etc. (Note: Should you have noticed that the order of the terms in the last sentence does not make sense at all, you probably know already everything that I'm going to talk about in this post).

Virtualization At Home

So how do we get from a desktop PC and what can be done with it today to NFV? Desktop and notebook PC hardware has become incredibly powerful over the years, memory capacity exceeds what most people need for most tasks and hard drive capacity is just as abundant. So for most of the time, the processor and memory is not very much utilized at all. The same has been happening on the server side with lots of CPU cycles and memory wasted if a physical server is only used for a single purpose, e.g. a file server.

To make better use of the hardware and to enable the user to do quite a number of new and amazing things with his PC, the industry has come up with virtualization, which means creating a virtual environment on a PC that looks like a real PC for any kind of software that runs in that virtual environment. This simulation is so complete that you can run a full operating system in it and it will never notice the difference between real hardware and simulated (virtual) hardware. The program that does that is called a hypervisor. Hypervisor sounds a bit intimidating but the basic idea behind it is very simple: Just think of it as a piece of software that simulates all components of a PC and that denies any program running in the virtual machine direct access to real physical hardware. In practice this works by the hypervisor granting direct unrestricted access to only a subset of CPU machine instructions. Whenever the program running in the virtual environment using the real CPU wants exchange data with a physical piece of hardware with a corresponding CPU input/output machine instruction, the CPU interrupts the program and calls the hypervisor program to handle the situation. Let's make a practical but somewhat abstracted example: If the machine instruction is about transferring a block of data out of memory to a physical device such as for example a hard drive, the hypervisor takes that block of data and instead of writing it to the physical hard drive it writes the block of data into a hard drive image file residing on a physical disk. It then returns control to the program running in the virtual environment. The program running in the virtual environment never knows that the block of data was not written to a real hard drive and happily continues its work. Interacting with any other kind of physical hardware such as the graphics card, devices connected to a USB port, input from the keyboard, mouse, etc. works in exactly the same way, the CPU interrupts program execution whenever a machine instruction is called that tries to read or write to or from a physical resource.

Running an Operating System in a Virtual Machine

There are several things that make such a virtual environment tremendously useful on a PC. First, a complete operating system can be run in the virtual environment without any modifications in addition to the normal operating system that runs directly on the real hardware. Let's say you have Windows 7 or 8 running on your PC, and every now and then you'd like to test new software but you don't want to do it on your 'real' operating system because you're not sure whether it's save or whether you want to keep it installed after trying it out. In such a scenario a virtual machine in which another Windows 7 or 8 operating system (the so called guest operating system) is executed makes a lot of sense as you can try out things there without making any modifications to your 'real' operating system (called the host operating system). Another example would be if you ran a Linux distribution such as Ubuntu as your main operating system (i.e. the host operating system) but every now and then you need to use a Windows program that is not available on that platform. There are ways to run Windows programs on Linux but running them in a virtual machine in which Windows 7 or 8 is installed is the better approach in many cases. In both scenarios the guest operating system runs in a window on the host operating system. The guest operating system knows nothing about it's screen going to a window, as from it's point of view it puts it's graphical user interface via the (simulated) graphics card to a real screen. It just can't tell the difference. Also, when the guest operating system writes something to its (simulated) hard disk, the hypervisor translates that request and writes the content into a large file on the physical hard drive. Again, the guest operating system has no idea that this is going on.

Running Several Virtual Machines Simultaneously

The second interesting feature of virtual machines is that the hypervisor can execute several of them simultaneously on a single physical computer. Here's a practical example: I'm running Ubuntu Linux as my main (host) operating system on my notebook and do most of my every day tasks with it. But every now and then I also want to try out new things that might have an impact on the system, so I run a second Ubuntu Linux in a virtual machine as sort of a playground. In addition I often have another virtual machine instance running Windows 7 simultaneously in addition to the virtual machine running Ubuntu. As a consequence I'm running three operating systems at the same time: The host system (Ubuntu), one Ubuntu in a virtual machine and a Window 7 in another virtual machine. Obviously a lot of RAM and hard drive storage is required for this and when the host and both virtual machines work on computational intensive tasks they have to share the resources of the physical CPU. But that's an exception as most of the time I'm only working on something on the host operating system or on something in one of the virtual machines and only seldom have have something computational intensive running on several of them simultaneously. And when I don't need a virtual machine for a while I just minimize it's window rather than shutting down the client operating system. After all, the operating system in the virtual machine takes little to no resources while it is idle. Again note that the guest OS doesn't know that it's running in a window or that the window has been minimized.

Snapshots

And yet another advantage of virtual machines is that the virtual machine manager software can create snapshots of a virtual machine. In the easiest scenario a snapshot is made while the virtual machine instance is not running. Creating a snapshot is then as simple as freezing the file that contains the hard disk image for that virtual machine and to create a new file to which all changes that are happening from now on are recorded to. Returning to the state of the virtual machine when the snapshot was taken is as simple as throwing away the file into which the changes were written. It's even possible to create a snapshot of a virtual machine while it is running. In addition to creating an additional file to write changes to that are made in the future to the hard drive image, the current state of all simulated devices including the CPU and all its registers are made and a copy of the RAM is saved on disk. Once this is done, the operating system in the virtual machine continues to run as if nothing had happened at all. From it's point of view nothing has actually happened at all because the snapshot was made by the hypervisor from outside the virtual environment so it has no way of knowing that a snapshot was even made. Going back to the state captured in the snapshot later on is then done by throwing away all changes that have been made to the hard drive image, to load the RAM content from the snapshot file and to reload the state of all simulated hardware components to the state they were in when the snapshot was made. Once that is done all programs that were running at the time the snapshot was made resume running at exactly the machine instruction they were about to execute. In other words, all windows on the screen are in the position again as they were when the snapshot was made, the mouse pointer is back in it's original place, music starts playing again at the point the snapshot was made, etc. etc. From the guest operating system's point of view it's as if nothing has happened, it doesn't even know that it has just been started from a snapshot. The only thing that might seem funny to it is that when it requests the time from a network based time server, there is a large gap between the time the network time server reports and the system clock, because when the snapshot is restored, the system clock still contains the value of the time the snapshot was taken.

Cloning a Virtual Machine

And the final cool feature of running an operating system in a virtual machine is that it's very easy to clone it, i.e. to make an exact copy. This is done by copying the hard disk image and tie it to a new virtual machine container. The file that contains the hard disk contents together with a description of the properties of the virtual machine such as what kind of hardware that was simulated can also be copied to a different computer and used with a hypervisor software there. If the other computer uses the same type of processor, the operating system running in the virtual machine will never notice the difference. Only if a different CPU is used (e.g. a faster CPU with more capabilities) can the guest operating system actually notice that something has changed. This is because the hypervisor does not simulate the CPU but grants the guest operating access to the physical CPU until the point where the guest wants to execute a machine instruction that communicates with the outside world as describe above. From the guest operating system's point of view, this looks like the CPU was changed on the motherboard.

By now you've probably gotten the idea why virtual machines are so powerful. Getting started with virtual machines on your desktop PC or your notebook is very easy. Just download an open source hypervisor such as VirtualBox and try it out yourself. Give the virtual machine one gigabyte of memory and connect an Ubuntu installation CD image to the virtual CD drive.

Virtualization in Data Centers in the Cloud

Before discussing NFV and SDN there's one more thing to look at first and that is virtualization in cloud computing. One aspect of cloud computing are large server farms operated by companies such as Amazon, Rackspace, Microsoft, etc.,  that offer virtualized servers for other companies and private individuals for use instead of equipment physically located on a company's premises or at a user's home. Such servers are immensely popular for running anything from simple web sites to large scale video streaming portals. This is because companies and individuals using such cloud based servers get a fat pipe to the Internet they might not have from where they are located and all the processing power, memory and storage space they need and can afford without buying any equipment. Leaving privacy and security issues out of the discussion at this point, using and operating such a server is no different from interacting with a local physical server. Most severs are not administrated via a graphical user interface but via a command line console such as ssh (secure shell). So it doesn't matter if a system administrator connects to a local Ubuntu server over the local network or an Ubuntu server running in the cloud over the Internet, it looks and feels the same. Most of these cloud based servers are not running directly on hardware but in a virtual machine. This is because even more so than on the desktop, server optimized processors and motherboards have become so powerful that they can run many virtual machines simultaneously. Modern x86 server CPUs have 8 to 16 cores and have direct access to dozens to hundreds of gigabytes of main memory. So it's not uncommon to see such servers running ten or more virtual machines simultaneously. Like on the desktop, many applications only require processing power very infrequently so if many of such virtual servers are put on the same physical machine, CPU capacity can be used very efficiently as the CPUs are never idle but are always put into good use by some of the virtual machines at any point in time.

Virtual machines can also moved between different physical servers while they are running. This is convenient, for example, in cases when a  physical server becomes overloaded due several virtual machines suddenly increasing their workload. When that happens, less CPU capacity is available per virtual machine that may have been guaranteed by the cloud provider. Moving a running virtual machine from one physical hardware to another while it is running is done by copying the contents of the RAM currently used by the virtual machine on one physical server to a virtual machine instance another. As the virtual machine is still running while it's RAM is being copied, some parts of the RAM that was already copied will be changed so the hypervisor has to keep track of this and re-copy those areas. At some the virtual machine is stopped and the remaining RAM that is still different is copied over to virtual machine on the target server. Once that is done, the state of the virtual machine such as CPU registers and the state of the simulated hardware is also copied. Once this has been done there is an exact copy of the virtual machine on the target server and the hypervisor then lets the operating system in the cloned virtual machine continue to work from exactly the point where it has been stopped on the original server. Obviously it is important to keep that cut-over time as short as possible and in practice, values in the order of a fraction of a second can be reached. Moving virtual machines from one physical server to another can also be used in other load balancing scenarios and also for moving all virtual machines running on a physical server to another so that the machine can be powered down for maintenance or replacement. 

Managing Virtual Machines In the Cloud

Another aspect I'd quickly like to address is how to manage virtual resources. On a desktop or notebook PC, hypervisors such as Virtualbox bring their own administration interface to start, stop, create and configure virtual machines. A somewhat different approach is required when using virtual resources in a remote data center. Amazon Web Services, Google, Microsoft, Rackspace and many others offer a web based administration of the virtual machines they offer, and getting up and running is as simple as as registering for an account and selecting a pre-configured virtual machine image with a base operating system (such as Ubuntu Linux, Windows, etc.) with a certain amount of RAM and storage. Once done, a single click launches the instance and the server is ready for the administrator to install the software he would like to use. While Amazon and others use a proprietary web interface, others such as Rackspace use OpenStack, an open source alternative. OpenStack is also ideal for companies to manage virtual resources in their own physical data centers.

Network Function Virtualization

And now let's finally come to Network Function Virtualization (NFV) and jump straight to a practical example. Voice over LTE (VoLTE) requires a number of logical network elements called Call Session Control Functions (CSCF) that are part of the IP Multimedia Subsystem (IMS). These network functions are usually shipped together with server hardware from network manufacturers. In other words, these network functions run on a server that is supplied by the same manufacturer. In this example, the CSCFs are just a piece of software and from a technical point of view there is no need to run them on a specialized server. The idea of NFV is to separate the software from the hardware and to put the CSCF software into virtual machines. As explained above, there are a number of advantages to this. In this scenario the separation means that network operators do not necessarily have to buy the software and the hardware from the same network infrastructure provider. Instead, the CSCF software is bought from a specialized vendor while off-the-shelf server hardware might be bought from another company. The advantage is that off-the-shelf server hardware is mass produced and there is stiff competition in that space from several vendors such as HP, IBM and others. In other words, the hardware is much cheaper. As the CSCF software is running in a virtual machine environment, the manufacturer of the hardware doesn't matter as long as the hypervisor can efficiently map simulated devices from the virtual machine to physical hardware of the server. Needless to say that this is one of the most important goals of companies working on hypervisors such as vSphere or KVM and companies working on server hardware. Once you have the CSCF network function running in virtual machines running on hardware of your choice you can do a lot of things that weren't possible before. Like described above, it becomes very easy, for example, to add additional capacity by installing off-the shelf server hardware and starting additional CSCF instances as required. Load sharing also becomes much easier because the physical server is not limited to only running virtual machines with a CSCF network function inside. As virtual machines are totally independent from each other, any kind of other operating system and software can be run in other virtual machines running on the same physical server and can be moved from one physical server to another while they are running when a physical server reaches its processing capacity at some point. Running different kinds of network functions in virtual machines on standard server hardware also means that there is less specialized hardware for the network operator to maintain and I suspect that this is one of the major goals that they want to achieve, i.e. to get rid of custom boxes and vendor lock-in.

Another network function that lends itself to run in a virtual machine is the LTE Mobility Management Entity (MME). This network function communicates directly with mobile devices via an LTE base station and fulfills tasks like authenticating a user and his device when he switches it on, it instructs other network equipment to set up a data tunnel for user data traffic to the LTE base station a device is currently located at, it instructs routers to modify the tunnel endpoint when a user moves to another LTE base station and generally keeps track of the device's whereabouts so it can page the device for incoming voice calls, etc. All of these management actions are performed over IP so from an architecture point of view, no special hardware is required to run MME software. It is also very important to realize that the MME only manages the mobility of the user and when the location of the user changes it sends an instruction to a router in the network to change the path of the user data packets. All data exchanged between the user and a node on the Internet completely bypasses the MME. And to put it again in other words, the MME network function is itself the origin and sink of what are called signaling messages that are encapsulated in IP packets. Such a network function is easy to virtualize because the MME doesn't care what kind of hardware is used to send and receive it's signaling messages. All it does is to put them into IP packets and send them on their way and it's knowledge and insight into how these IP packets are sent and received is exactly zero. What could be done, therefore, is to put a number of virtual machines running an MME instance each and a couple of other virtual machines running a CSCF instance on the same physical server. Mobile networks usually have many instances of MMEs and CSCFs and as network operators add more subscribers, the amount of mobility management signaling increases as well as the amount of signaling traffic via CSCF functions required for establishing VoLTE calls. If both network functions run on the same standard physical hardware, network operators can first fully load one physical server before spinning up another which is quite unlike the situation today where the MME runs on dedicated and non-standardized hardware and a CSCF runs on another expensive non-standardized server and both are only running at a fraction of their total capacity. Reality is of course more complex than this due to logical and physical redundancy concepts to make sure there are as few outages as possible. This increases the number of CSCF and MME instances running simultaneously. But the concept of mixing and matching virtualized network functions on the same hardware scales and can also be used for much more complex scenarios perhaps with even more benefits compared to the simple scenario that I just described.

Virtualizing Routers

In addition to network functions that are purely concerned with signaling such as MMEs and CSCFs, networks contain lots of physical routers that look at incoming IP packets and make decisions over which interface they have to be forwarded and if they should be modified before being sent out again. A practical example from the mobile world are the LTE Serving Gateway (SGW) and the Packet Data Network Gateway (PDN-GW), which are instructed by the MME to establish, maintain and modify tunnels between a moving subscriber and the Internet to hide the user's mobility from the Internet. To make routers as fast as possible, parts of the decision making process is not implemented in software but as part of dedicated hardware (ASICs). Thus, virtualizing routing equipment is very tricky because routing can no longer be preformed in hardware but has to done in software running in a virtual machine. That means that apart from making the routing decision process as efficient as possible, it is also important that forwarding IP packets from a physical network interface into a virtual machine and then sending them out again altered or unaltered over another virtual network interface to another physical network interface must incur as little overhead as possible. Intel seems to have spent a lot of effort in this area to close the gap as much as possible with it's Data Plane Development Kit (DPDK) and Single-Root IO-Virtualization (SR-IOV).

Software-Defined Networking

And now let's turn to Software-Defined Networking (SDN), a term that is often used in combination with Network Function Virtualization. SDN is something entirely different, however, so let's forget about all the virtualization aspects discussed so far for a moment. Getting IP packets from one side of the Internet to the other requires routers. Each router between the origin and destination of an IP packet looks at the packet header and makes a decision to which outgoing network interface to forward it to. That starts in the DSL/Wi-Fi/router box that looks at each IP packet that is sent to it from a computer in the home network and decides whether it should forward it over the DSL link to the network or not. Routers in the wide area network usually have more than one network interface so here the routing decision, i.e. to which network port to forward a packet is more complex. This is done with routing tables that contain IP address ranges and corresponding outgoing network interfaces. Routing tables are not static but but change dynamically, e.g. when network interfaces suddenly become unavailable, e.g. due to a fault or because new routes to a destination become available. Even more often, routing tables change because subnets on other parts of the Internet are added and deleted all the time. There are a number of network protocols such as BGP (Border Gateway Protocol) that are used by routers to exchange information about which networks it can reach. This information is then used on each router to decide if an update to the routing table is necessary. When the routing table is altered due to a BGP update from another router, the router will then also send out information to its downstream routers to inform them of the change. In other words, routing changes propagate through the Internet and each router is responsible on its own for maintaining the routing table based on routing signaling messages it receives from other routers. For network administrators this means that they have to keep a very close look to what each router in their network is doing, as each updates it's routing table autonomously based on the information it receives from other routers. Routers from different manufacturers have different administration interfaces and different ways to handle routing updates which adds additional complexity for network administrators. To make the administration process simpler and more deterministic, the idea behind Software-Defined Networking (SDN) is remove the proprietary administration interface and automated local modifications of the routing table in the routers and perform these tasks in a single software on a centralized network configuration platform. Routers would only forward packets according to the rules they receive from the centralized configuration platform and according to the routing table which it has also received from the centralized platform. Changes to the routing table are made in a central place as well instead of in a decentralized manner in each router. The interface SDN uses for this purpose is described in the OpenFlow specification which is standardized by the Open Network Foundation (ONF). A standardized interface enables network administrators to use any kind of centralized configuration and management software independent from the manufacturers of the routing equipment they use in their network. Router manufacturers can thus concentrate designing efficient router hardware and the software required for inspecting, modifying and forwarding packets.

Summary

This essay has become a lot longer than I originally intended but I wanted to make sure it becomes clear that the concept of NFV does not come out of thin air but is actually based on the ideas that have radically changed other areas of computing in the past decade. SDN, in contrast, is something radically new and addresses the shortcomings of a decentralized and proprietary router software and control approach that get worse as networks become more and more complicated. On the one hand, implementing NFV and SDN is not going to be an easy task because these concepts fundamentally change how the core of the Internet works today. On the other hand, the expected benefits in terms of new possibilities, flexibility, easier administration of the network and cost savings are strong motivators for network operators to push their suppliers to offer NFC and SDN compatible products.

Useful Links

There's tons of interesting material available on the Internet around NFC and SDN but I've decided to only include three interesting links here that tell the story from were I leave off:

 

Nice Android Feature: NFC and Touch to Share

I've used Bluetooth a lot over the years to send images, address book and calendar entries to other Blueooth devices. While the technology is mature and pretty much works accross devices from different vendors it's always a bit of a hassle to use, especially when transfering something to a device of another person. That usually involves asking the other person to switch-on Bluetooth and to make his device visible for some time. Most people don't know how that works which, in many cases, kills the use case of transfering something quickly rather than typing in the information to be shared. But recently I've done the same via NFC authentication and the user experience is much better!

On Android the feature is called "Android Beam". When NFC is enabled on both devices, sharing an image, a calendar entry or a contact is as easy as bringing up the entry to be shared on the screen and then holding the two devices close to each other. A tone will played and Android asks the user if he wants to share the image/calendar entry/contact, etc. Clicking on o.k. will initiate the transfer, which is then done via Bluetooth according to the Wikedia entry linked to above and also from the data transfer times I've experienced.

Speaking of transfer times: Android Beam works great for small amounts of data, i.e. everything including images. Transfering videos is a different matter, their size is usually much too big for Bluetooth speeds.

Transit and Interconnect Speeds in Germany in 2013

A couple of days ago I reported about the findings of the 2013 German telecommunication regulators report. Among other things, the report contained two numbers: In 2013, mobile networks in Germany carried a total of 267 million gigabytes while fixed line network carried 8 billion gigabytes. The numbers are staggering but what does that mean in terms of transit and interconnect links required, i.e. how much data is flowing to and from the wider Internet into those networks per second?

Let's take the mobile number first, 267 million gigabytes per year. There are four mobile network operators in the country, let's say one of them handles 30% of that traffic. 30% of that traffic is 80 million GB per year and 219.178 GB per day. Divided by 24 to get the traffic per hour and then divided by 60 and again by 60 to get down to the GB per second that's 2.53 GB/s or roughly 20 Gbit/s. This number does not yet include usage variations throughout the day so the peak throughput during the busiest times of the day must be higher.

On the fixed line side, the number is even more staggering. Let's say the incumbent handles 70% of the 8 billion gigabytes of the year (only an assumption, use your own value here) this boils down to a backhaul speed of 1.4 Tbit/s (1400 Gbit/s) plus whatever the peak throughput during busy times during the day is above the average.

I'm impressed!

Owncloud And Selfoss Brain Transplant – Using TAR For A Running Backup

My Owncloud and Selfoss servers at home have become a central service for me over the past year and I have taken care that I can cut-over my services to an alternative access remotely should the DSL line or DSL router fail. For the details see here and here. While I'm pretty well covered for line failures I'd still be dead in the water if the server itself would fail. Agreed, this is unlikely but not unheard of, so I decided to work on a backup strategy for such a scenario as well.

What I need is not only the data that I could restore on a backup server but that backup server must be up and running so that I can quickly turn it into the main server even while not at home should it become necessary. As it's a backup server and slow service is better than no service it doesn't have to be very powerful. Therefore, I decided to use a Raspberry Pi for the purpose on which I've installed Owncloud and Selfoss. Getting the data over from the active servers is actually simpler than I thought by using tar:

To create a copy I think its prudent to halt the web server before creating a copy of the Owncloud data directory as follows:

sudo service apache2 stop
cd /media/oc/owncloud-data

sudo tar cvzf 2014-xx-xx-oc-backup.tar .
sudo service apache2 restart

This creates a complete copy of all data of all users. The tar file can become almost as big as all data stored on Owncloud so there must be as much free disk space left on the server as there is data stored in the Owncloud data folder.

In the second step, the tar archive is copied to the target machine. Scp or sftp do a good job. Once the file has been copied, the tar file should be deleted on the server to free up space.

On the target machine, I stop the web server as well, delete the the owncloud data directory and then unpack the tar archive:

sudo service apache2 stop
cd /media/oc/owncloud-data
sudo rm -rf owncloud-data
sudo mkdir owncloud-data
cd owncloud-data
tar xvzf /path-to-tar/2014-xx-xx-oc-backup.tar
sudo chown -R www-data:www-data /media/oc/owncloud-data
sudo service apache2 start
rm /path-to-tar/2014-xx-xx-oc-backup.tar

And once that is done the backup Owncloud instance runs with the calendar, address book and file data of the standard server.

One important thing to keep in mind when setting this up for the first time is to copy and paste the password salt value from /var/www/owncloud/config/config.php over to the backup server as well as otherwise it's not possible to log into the backup Owncloud instance. And finally, have a close look in the same configuration file on the backup server if the 'trusted domains' parameters match your requirements. For details have a look at the end of this page.

The same principle also works for getting a working copy of the Selfoss RSS server.

Using SSH to Tunnel Packets Back To The Homecloud

In my post from a couple of day ago on how to home-cloud-enable everyone, one of the important building blocks is an SSH tunnel from a server with a public IP address to the Owncloud server at home, i.e. not accessible directly from the Internet. I didn’t go into the details of how this works, promising to do that later. This is what this post is about.

So why is such a tunnel needed in the first place? Well, obviously, technically savvy users can of course configure port forwarding on their DSL or cable router to their Owncloud server but the average user just stares at you when you make the suggestion. Also, many alternative operators today don’t even give you public IP addresses anymore so even the technically savvy users are out of luck. So for a universal solution that will work behind any connection no matter how many NATs are put in the way, a different approach is required.

My solution to the problem is actually pretty simple once you think about it: What NATs and missing public IP addresses do is to prevent incoming traffic that is not the result of an outgoing connection establishment request. To get around this, my solution that I’ve been running from my own network at home for some time now over a cellular connection (!) 24/7 establishes an SSH tunnel from my Owncloud server to a machine on the Internet with a public IP address and tunnels the tcp port used for http (443) from that public machine through the SSH tunnel back home. If you think it’s complicated to set up you are mistaken, a single command is all it takes on the Owncloud server:

nohup sudo ssh -o ServerAliveInterval=60 -o ServerAliveCountMax=2 -p 16999 -N -R 4711:localhost:443 ubuntu@ec2-54-23-112-96.eu-west-1.compute.amazonaws.com &

O.k. it’s a mouthful so here’s what it does: ‘nohup’ ensures that ssh connection stays up even when the shell window is closed. If not given, the ssh task dies when the shell task goes away. ‘sudo’ is required as tcp port 443 used for secure https requires root privileges to forward. The ‘ServerAliveInterval’ and ‘ServerAliveCountMax’ options ensure that a stale ssh tunnel gets removed quickly. The ‘-p 16999’ is there as I moved the ssh daemon on the remote server from port 22 to 16999 as otherwise there are to many automated attempts to ssh into my box from unfriendly bots. Not that that does any harm but it pollutes the logs. The ‘-N’ option suppresses a shell session to be established because I just need the tunnel. The ‘-R’ option is actually the core of the solution as it forwards the https tcp port 443 to the other end. Instead of using the same port on the other end, I chose to use a different one, 4711 in this example. This means that the server is accessible lateron via ‘https://www.mydomain.com:4711/owncloud’.  Next comes the username and url of the remove server. And finally the ‘&’ operator makes the command go to the background so I can close the shell window from which the command is started.

All of this begs the question, of course, which server I used on the Internet to connect to. Any Linux based server will do and there are lots of companies offering virtual servers by the hour. For my tests I chose to go with Amazon’s EC2 service as they offer a small Ubuntu based virtual server for free for a one year trial period. It’s a bit ironic, I am using a virtual server in the public cloud to keep my private data from the very same cloud. But while it is ironic it meets my needs as all data traffic to that server and through the SSH tunnel is encrypted end to end via HTTPS so nothing private ever ends up on that server. Perfect. Setting up an EC2 instance can be done in a couple of minutes if you are familiar with Ubuntu or any other Linux derivative and once done you can SSH into the new virtual instance, import or export the keys you need for the ssh tunnel the command above establishes and to set the firewall rules for that instance so port 16999 for ssh and 4711 for https is opened to the outside world.

And that’s pretty much it, there’s not even additional software that needs to be installed on the EC2 instance.

Raising the Shields Part 14: Setting Up An OpenVPN Server With A Raspberry Pi

After all the mess with Heartbleed a few weeks ago and updating my servers I started thinking about the current state of security of my VPN gateway at home. So far, I've used a very old but rock solid Linksys WRT-54G with DD-WRT on it for providing Wi-Fi at home and VPN server functionality for when I'm roaming. But the latest DD-WRT version for that hardware is several years old and was fortunately made before the Heartbleed bug was introduced. So I was safe on that side. But for such a vital and security sensitive function I don't think it's a good idea to run such old software. So I decided to do something about it and started to look into how to set up an OpenVPN server on a platform that receives regular software updates. And nothing is better and cheaper for that than a Raspberry Pi.

Fortunately I always have a spare at home and after some trial and error I found these very good step-by-step introductions of how to setup up OpenVPN on a Raspberry Pi over at ReadWrite.com. If you take the advice seriously to type in the commands rather than to copy/paste them it's almost a no-brainer to do if you know your way around a bit in the Linux domain.

The second part of the instructions deals with setting up the client side on MacOS and Windows. Perhaps the OPVN configuration file is also usable on a Linux system but I decided to configure the OpenVPN client built into the Ubuntu network manager manually. The four screenshots below show how that is done. As some networks have trouble forwarding the VPN tunnel with a maximum packet size (MTU) of 1500 bytes I chose to limit packet size to 1200 bytes as you can see in the second screenshot.

Vpn-client1 Vpn-client2 Vpn-client3 Vpn-client4Another thing that made the whole exercise worthwhile is that I have understood a lot better how OpenVPN uses the client and server certificates. I always assumed that it was enough to just remove a client certificate from the server to disallow a client establishing a VPN connection. Turns out that this is not correct. The OpenVPN server actually doesn't need the client certificate at all as it only looks if a certificate supplied by the client during connection establishment was signed with the certificate of the the certificate authority that is set up on the server as part of the OpenVPN installation. That was surprising. Therefore, revoking access rights to a client means that the client certificate has to be put on a local certificate revocation list the server checks before proceeding with the connection establishment. I'll have a follow up post to describe how that is done.

A final thought today is on processor load: My VDSL uplink at home limits my tunnel to a maximum speed of 5 Mbit/s. A download at this speed takes around 50% processor capacity on the Raspberry Pi independent of whether I'm using the NATed approach as decribed in the tutorial or simple routing betweeen the VPN tunnel and my network at home. At the moment I don't care as a faster backhaul is not in sight. But at some point I might need to revisit that.