7

I have a directory tree with gzipped files like this:

basedir/a/file.dat.gz
basedir/b/file.dat.gz
basedir/c/file.dat.gz
etc.

How can I convert all of these from gzip to xz with a single command and without decompressing each file to disk?

The trivial two-liner with decompressing to disk looks like this:

find basedir/ -type f -name '*.dat.gz' -exec gzip -d {} \;
find basedir/ -type f -name '*.dat' -exec xz {} \;

First command could even be shorter: gunzip -r *

For a single file on-the-fly conversion is simple (although this doesn't replace the .gz file):

gzip -cd basedir/a/file.dat.gz | xz > basedir/a/file.dat.xz

Since gzip and xz are handling the extensions themselves I'd like to say:

gunzip -rc * > xz

I looked at find | xargs basename -s .gz { } a bit but didn't get a working solution.

I could write a shell script, but I feel there should be a simple solution.


Edit

Thanks for all who answered already. I know we all love 'commands that will never fail™'. So, to keep this simple:

  • All subdirectories contain only numbers, letters (äöü, though), underscore and minus.
  • All files are named file.dat[.n].gz, n being a positive integer
  • No directory or file will have a '.gz' anywhere (other than as the final file suffix).
  • This is the only content these directories contain.
  • I control the naming and can restrict it if needed.

Using a simple find -exec ... or ls | xargs, is there a command to replace '.gz' in the found filename by '.xz' on the fly? Then I could write something like (pseudo):

find basedir/ -type f -name '*.gz' -exec [ gzip -cd {} | xz > {replace .gz by .xz} \; ]
Martin Hennings
  • 333
  • 5
  • 10

4 Answers4

10
find . -name '*.gz' -type f -exec bash -o pipefail -Cc '
  for file do
    gunzip < "$file" | xz > "${file%.gz}.xz" && rm -f "$file"
  done' bash {} +

The -C prevents overwriting an existing file and won't follow symlinks except if the exiting file is a non-regular file or a link to a non-regular file, so you would not lose data unless you have for instance a file.gz and a file.xz that is a symlink to /dev/null. To guard against that, you could use zsh instead and also use the -execdir feature of some find implementations for good measure and avoid some race conditions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    gunzip < "$file" | (
      sysopen -u 1 -w -o excl -- "${file%.gz}.xz" && xz) &&
      rm -f -- "$file"
  done' zsh {} +

Or to clean-up xz files upon failed recompressions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    sysopen -u 1 -w -o excl -- "${file%.gz}.xz" &&
      if gunzip < "$file" | xz; then
        rm -f -- "$file"
      else
        rm -f -- "${file%.gz}.xz"
      fi
  done' zsh {} +

If you'd rather it being short, and are ready to ignore some of those potential issues, in zsh, you could do

for f (./**/*.gz(D.)) {gunzip < $f | xz > $f:r.xz && rm -f $f}
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Thanks, I'll have a look at that tomorrow. I feared it would result in a shell script rather than a simple one-liner. – Martin Hennings Sep 08 '16 at 15:49
  • @Martin, you don't have to put it in a shell script, you can run that at your prompt (even on one line if you replace some of the newlines with semi-colon). I've added a shorter zsh example. – Stéphane Chazelas Sep 08 '16 at 16:34
2

I like simple for loops...

for file in basedir/*/*.gz
do
    gzip -cd < "$file" | xz > "${file%%.gz}.xz"
done

...at least, if your directory structure is regular and simple enough. If you have to traverse to unknown depths, or additional conditions on file selection, you still have to stick with find or similar.

frostschutz
  • 47,228
  • 5
  • 112
  • 159
0
find basedir/ -type f -name '*.dat.gz'|while read -r line; do
 gzip -cd "$line" | xz > ${line%.gz}.xz
 rm "$line"
done
Ipor Sircer
  • 14,376
  • 1
  • 27
  • 34
0

You can do this with find and parallel

parallel -0 'gzip -cd '{}' | xz > '{.}'.xz; rm '{}'' < <(find basedir -iname \*gz -print0)

Steps completed:

  • recursive find all files ending in gz (case-insensitive)
  • Stdin from process substitution
  • parallel gzip foo.gz | xz > {foo}.xz; rm foo.gz
    • {.} removes the .gz from foo.gz (in my understanding)
Miati
  • 3,080
  • 4
  • 19
  • 24
  • I was unable to create a file called **a$(touch /tmp/could_have_been_worse)b** so I cannot test that (likely due to forward slashes). I decided to test with **block1.gz** **blo*(&%&^$ %(* %&@*()#& %)(*#ck5.gz** and **block$(touch couldbeworse).gz**. I did encounter some "*No such file*" errors. After I replaced **gzip -cd "{}"** with **gzip -cd '{}'** this no longer occurred and I had similar .xz files. I did the same for **{.}** and the other **{}** for good measure. – Miati Sep 09 '16 at 00:35
  • @Gilles You raise good points. For fun, I changed one of the filenames to binary_garbage.gz with cp file "$(cut -b1-20 < file.gz)". This worked fine. Next I attempted each tr character (\v \t \n etc). This caused failure. I added -0 and -print0 to parallel and find. These all now *work fine*. My terminal and gui file manager are confused of course. If you can identify what will cause it to fail, let me know. However based on my testing, I think this command is *very* robust (now). – Miati Sep 09 '16 at 01:27
  • I think your parallel invocation is correct now, because parallel quotes special characters for interpolation in a shell script: With (non-ancient) parallel, unlike find or xargs, `{}` is replaced by the item with backslashes added, not by the actual item. Note that what you wrote is equivalent to `'gzip -cd '{} | xz > {.}.xz; rm {}'` — all these extra single quotes are redundant in the outer shell, and the shell executed by `parallel` doesn't have any quotes except the ones put there by `parallel` — and this is important: if you put quotes around `{}`, that conflicts with what parallel does. – Gilles 'SO- stop being evil' Sep 09 '16 at 12:20