0

Currently using following model, but one needs double the amount of disk space to restore compressed archive given one has to pipe all parts to tar before one can delete them.

$ COPYFILE_DISABLE=true tar \
  --create \
  --directory ~/data/dataset \
  --use-compress-program lz4 \
  --verbose \
  . | \
  split \
  --bytes 10G \
  --numeric-suffixes \
  - \
  dataset.tar.lz4.part
$ cat dataset.tar.lz4.part* | \
  tar \
  --extract \
  --directory ~/data/dataset \
  --use-compress-program lz4 \
  --verbose

Is there a more efficient model where parts can be deleted FIFO (first in first out) as they are decompressed?

sunknudsen
  • 307
  • 2
  • 11

1 Answers1

2

You can always do:

for part in dataset.tar.lz4.part*; do
  cat < "$part" || break
  rm -f -- "$part"
done |
  tar \
  --extract \
  --directory ~/data/dataset \
  --use-compress-program lz4 \
  --verbose

Don't use a gz suffix for lz4-compressed files, that would be misleading, gz is for gzip.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Thanks for helping out Stéphane! Does this model work for a 500GB dataset archived and compressed into 10G parts? – sunknudsen Feb 22 '22 at 17:12
  • Also, is there a way to make model resilient to failure? My understanding is that if something goes wrong, one would have to download all deleted parts again, right? – sunknudsen Feb 22 '22 at 17:17
  • Thanks for heads-up about `.gz`… fixed! – sunknudsen Feb 22 '22 at 17:18
  • Curious why [--](https://unix.stackexchange.com/questions/11376/what-does-double-dash-mean) in this use case? – sunknudsen Feb 22 '22 at 17:26
  • This is brilliant! – sunknudsen Feb 22 '22 at 18:09
  • Why `cat < "$part" || break` vs `cat "$part" || break`? – sunknudsen Feb 22 '22 at 18:18
  • 1
    @sunknudsen, replying to your questions in turn: (1) yes, there's no limit of size that I'm aware (2) to make it resilient and be able to resume an interrupted, you'd need separate archives, and delete them after successful extraction (3) `rm -f "$part"` would be incorrect in isolation, it would be OK here because we know from context that `$part` doesn't start with `-`, but I prefer showing the alwats correct `rm -f -- "$part"` variant. (4) see [When should I use input redirection?](//unix.stackexchange.com/q/70756) – Stéphane Chazelas Feb 22 '22 at 19:13
  • Thanks Stéphane! – sunknudsen Feb 22 '22 at 20:54