I recently made a new storage server to replace my old one to keep up with my growing space requirements (I think 40T should hold me over for a while!). I store all of my movies, music, tv shows, etc. on it, as well as all of my backups. All of my laptops and desktop computers also backup to this server using rsync.

While it's all stored on SmartOS using the ZFS filesystem in a raid setup that can handle 2 or more drive failures without data loss, it still worries me because it is all stored in one physical location: my closet. If there is a fire or some other disaster like that, all of my data could potential be lost.

To remedy this, I've repurposed the server I replaced (my old storage server) to be an off-site backup server that is used solely for ZFS receive. This server now runs FreeBSD, which you can read about in my blog post here

Automatic Snapshots

Before diving into my off-site backup solution, the first thing to talk about is how I handle automatic ZFS snapshots, and also removing snapshots as they get too old.



Recursively snapshot all zpools

I use this program to snapshot all zpools on my new storage server automatically in cron. My crontab looks something like this:

1 0 * * * /opt/custom/bin/zfs-snapshot-all automated_daily   >> /var/log/autosnap.log 2>&1
2 0 * * 0 /opt/custom/bin/zfs-snapshot-all automated_weekly  >> /var/log/autosnap.log 2>&1
3 0 1 * * /opt/custom/bin/zfs-snapshot-all automated_monthly >> /var/log/autosnap.log 2>&1
4 0 1 1 * /opt/custom/bin/zfs-snapshot-all automated_yearly  >> /var/log/autosnap.log 2>&1

Snapshots for each pool are created daily, weekly, monthly, and yearly, with a name like automated_daily_$EPOCH.



Remove snapshots from one or more zpools that match given criteria

I use this program to remove snapshots as they get too old automatically in cron. My crontab looks something like this:

11 0 * * * /opt/custom/bin/zfs-prune-snapshots -p automated_daily_   7d   >> /var/log/autosnap.log 2>&1
12 0 * * 0 /opt/custom/bin/zfs-prune-snapshots -p automated_weekly_  4w   >> /var/log/autosnap.log 2>&1
13 0 1 * * /opt/custom/bin/zfs-prune-snapshots -p automated_monthly_ 12M  >> /var/log/autosnap.log 2>&1
14 0 1 1 * /opt/custom/bin/zfs-prune-snapshots -p automated_yearly_  10y  >> /var/log/autosnap.log 2>&1

This removes daily snapshots after 7 days, weekly snapshots after 4 weeks, monthly snapshots after 12 months, and yearly snapshots after 10 years.

Off-site Backups

With the above 2 programs, snapshots are created and managed automatically on my new storage server, however all of the data still lives on the same physical machine.

The final program is used to ZFS send data from my new storage server to be received on my old storage server, which lives at @papertigerss house.



Incremental ZFS send/recv backup script

NOTE: At the time of this writing, this script (due to laziness) takes configuration data inside the source itself.

At a high level, it works in 3 steps for every dataset you want to backup.

  1. Create snapshot locally
  2. Find latest remote snapshot
  3. Incremental send from latest remote snapshot -> new local snapshot

If no snapshot is found on the remote end, a full ZFS send is performed.

Finally, when the script finishes, I have a pushover notification sent to my phone to let me know how long it took, and if it was successful. An example run looks like:

# ./zincrsend
starting on Fri Dec  4 11:16:00 UTC 2015

processing dataset: goliath/public

creating snapshot locally: goliath/public@zincrsend_1449227760
latest remote snapshot: paper/public@zincrsend_1449173284
zfs sending (incremental) @zincrsend_1449173284 -> goliath/public@zincrsend_1449227760 to paper/public
receiving incremental stream of goliath/public@zincrsend_1449227760 into paper/public@zincrsend_1449227760
received 312B stream in 1 seconds (312B/sec)

processing dataset: goliath/minecraft

creating snapshot locally: goliath/minecraft@zincrsend_1449227763
latest remote snapshot: paper/minecraft@zincrsend_1449173286
zfs sending (incremental) @zincrsend_1449173286 -> goliath/minecraft@zincrsend_1449227763 to paper/minecraft
receiving incremental stream of goliath/minecraft@zincrsend_1449227763 into paper/minecraft@zincrsend_1449227763
received 312B stream in 1 seconds (312B/sec)

processing dataset: goliath/backups

creating snapshot locally: goliath/backups@zincrsend_1449227766
latest remote snapshot: paper/backups@zincrsend_1449173288
zfs sending (incremental) @zincrsend_1449173288 -> goliath/backups@zincrsend_1449227766 to paper/backups
receiving incremental stream of goliath/backups@zincrsend_1449227766 into paper/backups@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/dave@zincrsend_1449227766 into paper/backups/dave@zincrsend_1449227766
received 217MB stream in 367 seconds (607KB/sec)
receiving incremental stream of goliath/backups/skye@zincrsend_1449227766 into paper/backups/skye@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/dad@zincrsend_1449227766 into paper/backups/dad@zincrsend_1449227766
received 312B stream in 1 seconds (312B/sec)
receiving incremental stream of goliath/backups/web@zincrsend_1449227766 into paper/backups/web@zincrsend_1449227766
received 3.16MB stream in 5 seconds (647KB/sec)

script ran for ~6 minutes (384 seconds)

pushover sent!



Because I run these scripts on SmartOS, I have to recreate the crontab upon a reboot, as all of that data is ephemeral. To do this, I have a "boot" service that runs every time the server is restarted, which runs a shell script. I create a persistent directory for both the manifest and the script:

mkdir /opt/custom/boot

Which contains 2 files.


<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
        <service name='smartos/boot' type='service' version='0'>
                <create_default_instance enabled='true'/>
                <dependency name='net-physical' grouping='require_all' restart_on='none' type='service'>
                        <service_fmri value='svc:/network/physical'/>
                <dependency name='filesystem' grouping='require_all' restart_on='none' type='service'>
                        <service_fmri value='svc:/system/filesystem/local'/>
                <exec_method name='start' type='method' exec='/opt/custom/boot/method' timeout_seconds='0'/>
                <exec_method name='stop' type='method' exec=':true' timeout_seconds='60'/>
                <property_group name='startd' type='framework'>
                        <propval name='duration' type='astring' value='transient'/>
                <stability value='Unstable'/>
                                <loctext xml:lang='C'>SmartOS boot script</loctext>


[[ -d /root ]] && rm -rf /root
mkdir -p /opt/custom/root
ln -s /opt/custom/root /root

crontab <<-EOF
0 11 * * * date >> /var/log/autosnap.log 2>&1

1 11 * * * /opt/custom/bin/zfs-snapshot-all automated_daily   >> /var/log/autosnap.log 2>&1
2 11 * * 0 /opt/custom/bin/zfs-snapshot-all automated_weekly  >> /var/log/autosnap.log 2>&1
3 11 1 * * /opt/custom/bin/zfs-snapshot-all automated_monthly >> /var/log/autosnap.log 2>&1
4 11 1 1 * /opt/custom/bin/zfs-snapshot-all automated_yearly  >> /var/log/autosnap.log 2>&1

5 11 * * * /opt/custom/bin/zfs-prune-snapshots -p automated_daily_   7d   >> /var/log/autosnap.log 2>&1
6 11 * * 0 /opt/custom/bin/zfs-prune-snapshots -p automated_weekly_  4w   >> /var/log/autosnap.log 2>&1
7 11 1 * * /opt/custom/bin/zfs-prune-snapshots -p automated_monthly_ 12M  >> /var/log/autosnap.log 2>&1
8 11 1 1 * /opt/custom/bin/zfs-prune-snapshots -p automated_yearly_  10y  >> /var/log/autosnap.log 2>&1

9  11 * * * /opt/custom/bin/zfs-prune-snapshots -p zincrsend_ 1M >> /var/log/zincrsend.log 2>&1
10 11 * * * /opt/custom/bin/zincrsend >> /var/log/zincrsend.log 2>&1'

The first 3 lines of the script create a persistent /root directory effectively, and the rest of it creates a crontab file that:

  1. creates automatic snapshots
  2. deletes old snapshots
  3. sends data to the off-site backup