16

I have a local FreeNAS system and want to use ZFS snapshots for backups.
FreeNAS has the built-in Replication Tasks which use

zfs send snapshot_name

to send a snapshot to a remote system. But this needs a system with ZFS on the other end.

I want to send the snapshot to a a file and send this compressed and encrypted file to the remote machine.

This is possible with

zfs send snapshot_name | gzip | openssl enc -aes-256-cbc -a -salt > file.gz.ssl

Everyday I make a snapshot of the storage pool and keep every snapshot for 30 days.
With every snapshot taken I'll pipe this snapshot to a file.
- snapshot_file 1 has every file in it (let's say 2GB)
- snapshot_file 2 only has the changes to snapshot_file 1 (let's say 5MB)
- snapshot_file 3 holds the changes to snapshot_file 2; and so on.

On day 31 snapshot_file 1 is getting deleted (because I only want the changes from the last 30 days)

Therefore snapshot_file 2 needs to hold every file (2GB of snapshot_file 1 + 5MB changes)

But with this approach everyday (from day 31 on) a new 2GB file has to be created and send to a remote system. This is too much overhead.

What would be the best approach to use snapshots piped to a file as a backup strategy with a history of X days?

P.S.: I know there are a lot of backup software out there (rdiff-backup for example), which I could use. But I am curious how this could be done.

Timo
  • 6,202
  • 1
  • 26
  • 28
Martin Grohmann
  • 263
  • 1
  • 2
  • 5
  • Why don't you use `zfs recv` on the other end (on a pool with `zfs set compression=gzip-9` for instance). Storing snapshot files sounds very inefficient to me. – Stéphane Chazelas Feb 05 '14 at 20:23
  • 2
    @StephaneChazelas because I do not have a ZFS file system on the other end. My remote system is a gentoo box with ext4 (I know I could install zfsonlinux, but I rather not) – Martin Grohmann Feb 05 '14 at 20:31

1 Answers1

12

If you store the snapshots in files, as opposed to in the file system (e.g. with zfs receive), I'm afraid, this is not possible.

ZFS on the receiving side

If you use ZFS on the sending and on the receiving side you can avoid having to transfer the whole snapshot and only transfer the differences of the snapshot compared to the previous one:

ssh myserver 'zfs send -i pool/dataset@2014-02-04 pool/dataset@2014-02-05' | \
  zfs receive

ZFS knows about the snapshots and stores mutual blocks only once. Having the file system understand the snapshots enables you to delete the old ones without problems.

Other file system on the receiving side

In your case you store the snapshots in individual files, and your file system is unaware of the snapshots. As you already noticed, this breaks rotation. You either have to transmit entire snapshots, which will waste bandwidth and storage space, but enables you to delete individual snapshots. They don't depend on each other. You can do incremental snapshots like this:

ssh myserver 'zfs send -i pool/dataset@2014-02-04 pool/dataset@2014-02-05' \
  > incremental-2014-02-04:05

To restore an incremental snapshot you need the previous snapshots as well. This means you can't delete the old incrementals.

Possible solutions

You could do incrementals as shown in my last example and do a new non-incremental every month. The new incrementals depend on this non-incremental and you're free to delete the old snapshots.

Or you could look into other backup solutions. There is rsnapshot, which uses rsync and hard links. It does a very good job at rotation and is very bandwidth efficient, since it requires a full backup only once.

Then there is bareos. It does incrementals, which are bandwith- and space-saving. It has a very nice feature; it can calculate a full backup from a set of incrementals. This enables you to delete old incrementals. But it's a rather complex system and intended for larger setups.

The best solution, however, is to use ZFS on the receiving side. It will be bandwidth efficient, storage efficient and much faster than the other solutions. The only really drawback I can think of is that you should have a minimum of 8 GiB ECC memory on that box (you might be fine with 4 GiB if you don't run any services and only use it to zfs receive).

Marco
  • 33,188
  • 10
  • 112
  • 146
  • yes this I know. But what if I delete (because I only want to have a history of 30 days) the file dataset@2014-02-04? Then I only have the changes made after the Feb, 4th, but not every file. – Martin Grohmann Feb 05 '14 at 21:19
  • 2
    @MartinGrohmann I see what you mean now. Well that's the beauty of ZFS, you can delete the old snapshots on ZFS without problems. On other filesystems you have to keep the old ones. Maybe you're better off with something like `rsnapshot` then. Or you could start a new non-incremental after one month and then delete the previous incrementals. – Marco Feb 05 '14 at 21:24
  • thank you for your help; I just found [duplicity](http://duplicity.nongnu.org/index.html) That's probably the way to go with the ability of encryption. – Martin Grohmann Feb 05 '14 at 21:27
  • 2
    @MartinGrohmann Duplicity is a nice program, but [it suffers from the same problem](http://blog.sanctum.geek.nz/linux-crypto-backups/#comment-12017). If you only do incrementals your space keeps growing. You can't reclaim space without wasting bandwidth and doing a new full backup. Either go ZFS on both sides or have a look at [bareos](http://www.bareos.org/en/), it can calculate a new full backup from incrementals. That enables you to delete old incrementals without re-transferring everything. – Marco Feb 05 '14 at 21:33
  • If bandwidth from your source is the problem, a potential solution (which I'm implementing for my home ZFS NAS now) is to always only send incrementals to your remote storage, but once a month spin up a remote freeBSD VPS (e.g., on digital ocean) which can then open the last full snapshot, zfs recv the some # of incrementals into it, then store the result as a new snapshot. The VPS only needs to be around long enough to create the new base backup. Digital ocean has an API allowing easy creation / destruction of their VPSes. And your local system need only send incremental backups. – stuckj Nov 06 '16 at 02:20
  • my remote backup device only has 512MB of RAM, so i don't image ZFS would even run minimally... – Michael Jul 23 '20 at 19:27
  • Alternatively. A different solution I'm going to try (if you have sufficient space on the source system) is to use a file that holds a ZFS filesystem ON the local system as the destination. And, zfs send to that ZFS filesystem in a file (e.g., using znapzend to do thinning and such). Then use a backup tool that only backs up block differences (like duplicacy) to back that file up to a remote system (duplicacy can just store via sftp). This would be efficient transfers forever to a remote system at the expense of requiring your entire pool the be stored locally in a zfs filesystem in a file. – stuckj Mar 03 '21 at 17:52
  • In my use case, I'm using duplicacy to backup large static files directly (without ZFS), but need ZFS for a smaller amount of data that is dynamic (DB data, etc) which is constantly being modified. ZFS let's me do snapshots without breaking file locking. So, storing that dynamic data a second time isn't a huge deal when I can save a ton in bandwidth for incremental updates. – stuckj Mar 03 '21 at 17:54
  • Actually...maybe just running duplicacy on the `.zfs/snapshot` folder in the dataset would be sufficient. Duplicacy only uploads changed chunks and all files in the snapshot folder won't be modified so no worries about file modifications during a backup in there. That might save both the space and re-upload problem. At the expense of some more CPU in block hash calculations. – stuckj Mar 03 '21 at 18:57
  • Oh. Or, use the script this guy already wrote and mentioned in this issue (uses duplicacy): https://github.com/gilbertchen/duplicacy/issues/370. – stuckj Mar 03 '21 at 19:50