1

I want to backup a shared directory with lots of smalls files, which are constantly changed by my users (add/delete/edit). As usual, I use tar and filter the content through a compression program like xz/lzip.

Let's call the files after the operation begins as "old files", and the files being added/deleted by my users as "new files". I am fine with old and new files being mixed, but backing up a file with half of its binary being old and the other half being new is totally unacceptable. I've read the answers from this question and found that using tar is not safe at all.

Is it safe if I copy that directory to temporary location first using cp -r command? And if it is not, do I have any other options? Using LVM volume snapshot is not an option for me, because I am using a single 2TB portable drive plugged into an OpenWrt router's USB port, where I use samba to share it. Besides, I also want to keep the backup as small as possible, so that I can upload it to another file server.

Livy
  • 445
  • 4
  • 10
  • You say you can't use LVM snapshots, and I can understand why, but wouldn't BTRFS with incremental snapshot architecture help instead? – realpclaudio Feb 03 '20 at 11:08
  • @realpclaudio With turtle-slow speed (even with transparent compression disabled), BTRFS will be my last option. – Livy Feb 03 '20 at 18:23
  • It depends on your setup. For my specific workload btrfs (with COW disabled) has proven to a good filesystem for large VM images backup, with the benefits of subvolumes and snapshot. Got several backup systems up to 32TB of data on btrfs. Regards – realpclaudio Feb 04 '20 at 09:27

1 Answers1

0

I will suggest you to avoid cp as your file contents are constantly updating. cp is more efficient where you only add new files as cp copy files based on attributes.

rsync would be a better choice here as your file contents are constantly updating. So, in that case, rsync will ensure if there is any difference between the source and destination directory before copying anything.

I am using rsync for backing up a log file which is writing at 1.78 MB/s speed without an issue.

You can use following command for backup a directory with rsync,

rsync --archive --verbose [source] [destination]/

Where,

  • --archive: enable archive mode
  • --verbose: increase verbosity
arif
  • 1,379
  • 3
  • 15
  • 27
  • After following your suggestion, I've read some tutorials of `rsync`, and they all demonstrate it as a synchronization tool, to sync changes between 2 directories. Can you show me where it states that it is safe to sync a directory with files constantly change? And what kind of result to expect? For example, in your case, what is the status/version of the log file after `rsync` returns (the file data is before running the command, at some point during the sync, or just before the sync finishes)? – Livy Feb 03 '20 at 18:56
  • After seeing this this question (https://unix.stackexchange.com/questions/90245/is-using-rsync-while-source-is-being-updated-safe), it seems that all commands `tar`, `cp`, and `rsync` works the same way: by building a list of files before reading them. And the same thing applies to them: if a file is changed during the copying process, you may end up with one half of the old data, and the other half of the new data. In other word, the file is corrupted. In your case of the log file, I think the reason you're safe is that the log process "append" the data at the end of the file. – Livy Feb 03 '20 at 19:18
  • Btw, `--archive` is equal to `-rlptgoD`, so you don't need `--recursive` or `-r`. Hope this helps. And it is nice to know a new command for my future usage. :) – Livy Feb 03 '20 at 19:20
  • 1. Nope. I can't find any specific document where it guarantees safe sync a directory as it doesn't support locks. But it is a very popular tool used for backing up data. 2. I have checked the logs and compared it with `diff` and didn't notice any anomaly. – arif Feb 03 '20 at 20:52
  • I don't share your view that `rsync`, `cp` and `tar` works the same way. Maybe you are right about appending. You might want to create a test that meets your criteria and perform a sanity check for `rsync`. And thanks for the correction. – arif Feb 03 '20 at 20:57