Further Backup Strategy Improvements

One of my hobbies is to make sure to have a good backup strategy and to loose as little data as possible once the inevitable happens. Rsync and Luckybackup, a graphical frontend for rsync have been my friends for years and helped me to maintain identical copies of critical installations over the years. One weak spot in my backup strategy so far was that some devices are often out of the house and thus away from backup drives and emergency spare duplicates for weeks on end, so any loss of them would mean a significant loss of data. But I’ve got a solution for that, too, now!

When a remote network has a good uplink capacity (read: 10+ Mbit/s) it is entirely possible to use rsync over the network to synchronize an emergency spare copy of a notebook at home. Even if a gigabyte or more of data changes from one synchronization procedure to the next, double digit uplink datarates ensure the synchronization process finishes in a reasonable amount of time. The main problem with the approach is that not all remote local networks are suitable. Either they only have a meager uplink that is too slow when gigabytes of data have to be sychronized or the connection is metered and data transfers are too expensive. But this problem can be nicely circumvented by checking the Wifi SSID and only start an rsync session when a device is in a network that can handle the traffic.

So here’s a script I’ve come up that I run from an emergency spare device at home every 30 minutes via an entry in ‘/etc/crontab’. It connects via SSH to the roaming device via a tunnel the remote device has previously established to a jump host at home automatically. It’s a bit of a detour but NAT leaves one no other choice. The script then checks the SSID that is currently used on the other end. If it is in the list defined at the top of the script it starts the rsync session. As rsync only sends those parts of large files that have changed, it’s no problem to also synchronize files with a size of several gigabytes such as Thunderbird email archives in which only a few kilobytes have changed. Very efficient! In addition, the rsync ‘partial’ flag is handy to continue the transfer of large files instead of starting from scratch again in case connectivity is interrupted. The script is obviously not perfect, a lot of things could be put into variables at the top to make it easier to adapt. But I guess that’s not a big hurdle if you want to reuse it.

#!/bin/bash

# Configure the different Wifi SSIDs that are suitable for rsync here
wifi_ssids="eduroam|lalaland|A4711"

# Exit if rsync is already running
if pgrep -f "rsync --partial"; then

  echo 'rsync already running, exiting...';
  exit 1;
fi


cd /home/xyz/excluded-from-sync/auto-rsync

# -------------------------------------------

# The local file to which the Wifi info of the remote host is written to
remote_ssid_info="remote-wifi-info.txt"

rm $remote_ssid_info

# Get the current Wifi the remote side is connected to.

ssh -p 12345 xyz@my-domain.com "iwconfig" > $remote_ssid_info

date

# If one of the Wifi IDs given above are in the result file, do
# the rsync as there is enough bandwidth available.

cat $remote_ssid_info

if grep -E $wifi_ssids $remote_ssid_info
then
    echo; echo "Good Wifi SSID found, running rsync..."
    
    # Full rsync
    rsync --partial -h --progress --stats -r -tgo -p -l -D --delete-after --exclude 'excluded-from-sync' --exclude .local --exclude .mozilla --exclude .luckybackup-snaphots/ --exclude .cache --exclude .config --exclude .ssh --exclude .bash_history -e "ssh -p 12345" xyz@my-domain.com:/home/xyz/ /home/xyz/


else
    echo;
    echo "Not suitable or unknown Wifi, aborting"
fi

echo
date
echo
echo "--------------------------------------"
echo