5

I'm doing backups to LTO tape. Because my backups include a lot of small files, which slows down the read from disk. I'm using the buffer command to prevent shoe-shining my tapes:

bkname="test"; tobk="*" 
totalsize=$(du -csb $tobk | tail -1 | cut -f1) 
tar cvf - $tobk | tee >(sha512sum > $bkname.sha512) >(tar -tv > $bkname.lst) | mbuffer -m 4G -P 100% | pv -s $totalsize -w 100 | dd of=/dev/nst0 bs=256k

The problem with this approach is: I can't make a backup spanning multiple tapes, because the tar command isn't directly accessing the tape and therefore won't recognize a full tape.

What would be the correct way to buffer the small files and have multi tape backups at the same time?

Jonas Stein
  • 3,898
  • 4
  • 34
  • 55
  • What is the minimum write speed for your drive ? Also have you measured the rate at which `tar` will output the data on your system (by ending your pipe with `|pv>/dev/null' ? How do they compare ? – zeppelin Feb 03 '17 at 19:06
  • Do you know the max capacity of the tape(s) in advance? Would tar options `-L` (change tape after *n* kB) and `-M` (multi volume achieve), or `-F` (run command at the end of each tape) help? Not sure if you can keep your pipeline as it is, though. – dirkt Feb 04 '17 at 11:32
  • @zeppelin: the tape drive supports up to 120MB/s without compression or 240MB/s with compression enabled. tar delivers from 1MB/s up to 200MB/s, depending on the data. A lot of small files are very slow, large files are fast. – M. Schulz-Narres Feb 05 '17 at 16:11
  • @dirkt: your suggestions would actually work. The Problem with this approach is, that the tapedrive supports hardware compression. That means it can store 800GB to 1,6TB, depending on the data. So it would be great if tar would recognize a full tape, regardless of the amount of data written. Otherwise I would be wasting space – M. Schulz-Narres Feb 05 '17 at 16:11
  • How does the tape drive actually signal a full tape (I don't have the hardware)? Some event on `/dev/input/`? And if compression is done on-the-fly by writing on `/dev/nst0`, there's no way to know when the tape will be full beforehand, which is necessary with all this buffering. Is there some way to simulate compression in advance, or compress to a file which can then by written "raw"? – dirkt Feb 05 '17 at 16:19
  • @dirkt: actually i have no idea how the drive signals a full tape. I'll do some research on that. Writing to a file is the only "solution" I could come up with so far: first tar everyting to a file, then write this tar archive file to tape using tar again. The second tar command doesn't need a buffer, because I'm accessing a single file, which is quite fast, so it can use the builtin multi volume archive feature. I'm still hoping for some kind of on the fly solution though, because creating a 3TB tar file isn't fun. – M. Schulz-Narres Feb 06 '17 at 17:45
  • But if it's compressing, how do you know how big the first file can be? And on the topic of compression: Would it be possible to let `tar` compress it, and ignore/disable the tape compression feature? (If it's already compressed, tape compression won't decrease the size further). – dirkt Feb 06 '17 at 17:51
  • the first tar command doesn't need to split the archive, it can just create one large file. the second tar command will split it up to multiple tapes when using the -M option. Precrompressing the data is an option, but the tape drive can write faster when it does the compression itself. Software compression on the other hand would slow the process down. – M. Schulz-Narres Feb 06 '17 at 17:58

2 Answers2

2

Consider using star instead of GNU tar. The star program has a buffering tool builtin, solving your problem.

FUZxxl
  • 755
  • 1
  • 7
  • 21
1

The size of a tar archive is not the calculated size. Only a preflight run delivers the right size, but it doubles the workload. A example:

~# du -csb /usr | tail -1 | cut -f1
=> 1585916720

~# tar --totals -cf /dev/null /usr
=> 1656514560

My recommendation: Use a powerful backup tool like dar http://dar.linux.free.fr/

ingopingo
  • 807
  • 5
  • 7
  • i think dar can't handle tape drives. The progressbar is not that important to me, but thanks for your suggestion! – M. Schulz-Narres Feb 05 '17 at 16:14
  • 1
    Tape drive support in dar is the same as in tar; reading and writing to /dev/nst0. It supports volume spanning, but not tape control. This is the job of mt and mtx or a compete backup suite like bareos. – ingopingo Feb 05 '17 at 16:27