How to convert all files from gzip to xz on the fly (and recursively)?

Question

I have a directory tree with gzipped files like this:

basedir/a/file.dat.gz
basedir/b/file.dat.gz
basedir/c/file.dat.gz
etc.

How can I convert all of these from gzip to xz with a single command and without decompressing each file to disk?

The trivial two-liner with decompressing to disk looks like this:

find basedir/ -type f -name '*.dat.gz' -exec gzip -d {} \;
find basedir/ -type f -name '*.dat' -exec xz {} \;

First command could even be shorter: gunzip -r *

For a single file on-the-fly conversion is simple (although this doesn't replace the .gz file):

gzip -cd basedir/a/file.dat.gz | xz > basedir/a/file.dat.xz

Since gzip and xz are handling the extensions themselves I'd like to say:

gunzip -rc * > xz

I looked at find | xargs basename -s .gz { } a bit but didn't get a working solution.

I could write a shell script, but I feel there should be a simple solution.

Edit

Thanks for all who answered already. I know we all love 'commands that will never fail™'. So, to keep this simple:

All subdirectories contain only numbers, letters (äöü, though), underscore and minus.
All files are named file.dat[.n].gz, n being a positive integer
No directory or file will have a '.gz' anywhere (other than as the final file suffix).
This is the only content these directories contain.
I control the naming and can restrict it if needed.

Using a simple find -exec ... or ls | xargs, is there a command to replace '.gz' in the found filename by '.xz' on the fly? Then I could write something like (pseudo):

find basedir/ -type f -name '*.gz' -exec [ gzip -cd {} | xz > {replace .gz by .xz} \; ]

Stéphane Chazelas · Answer 1 · 2016-09-08T16:33:10.793

find . -name '*.gz' -type f -exec bash -o pipefail -Cc '
  for file do
    gunzip < "$file" | xz > "${file%.gz}.xz" && rm -f "$file"
  done' bash {} +

The -C prevents overwriting an existing file and won't follow symlinks except if the exiting file is a non-regular file or a link to a non-regular file, so you would not lose data unless you have for instance a file.gz and a file.xz that is a symlink to /dev/null. To guard against that, you could use zsh instead and also use the -execdir feature of some find implementations for good measure and avoid some race conditions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    gunzip < "$file" | (
      sysopen -u 1 -w -o excl -- "${file%.gz}.xz" && xz) &&
      rm -f -- "$file"
  done' zsh {} +

Or to clean-up xz files upon failed recompressions:

find . -name '*.gz' -type f -execdir zsh -o pipefail -c '
  zmodload zsh/system || exit
  for file do
    sysopen -u 1 -w -o excl -- "${file%.gz}.xz" &&
      if gunzip < "$file" | xz; then
        rm -f -- "$file"
      else
        rm -f -- "${file%.gz}.xz"
      fi
  done' zsh {} +

If you'd rather it being short, and are ready to ignore some of those potential issues, in zsh, you could do

for f (./**/*.gz(D.)) {gunzip < $f | xz > $f:r.xz && rm -f $f}

Thanks, I'll have a look at that tomorrow. I feared it would result in a shell script rather than a simple one-liner. — Martin Hennings, Sep 08 '16 at 15:49
@Martin, you don't have to put it in a shell script, you can run that at your prompt (even on one line if you replace some of the newlines with semi-colon). I've added a shorter zsh example. — Stéphane Chazelas, Sep 08 '16 at 16:34

score 2 · Answer 2 · answered Sep 08 '16 at 16:04

2

I like simple for loops...

for file in basedir/*/*.gz
do
    gzip -cd < "$file" | xz > "${file%%.gz}.xz"
done

...at least, if your directory structure is regular and simple enough. If you have to traverse to unknown depths, or additional conditions on file selection, you still have to stick with find or similar.

answered Sep 08 '16 at 16:04

frostschutz

47,228
5
112
159

or with Bash: `shopt -s globstar; for file in basedir/**/*.dat.gz ; do ...` – ilkkachu Sep 08 '16 at 16:06

score 0 · Answer 3 · answered Sep 08 '16 at 15:24

0

find basedir/ -type f -name '*.dat.gz'|while read -r line; do
 gzip -cd "$line" | xz > ${line%.gz}.xz
 rm "$line"
done

answered Sep 08 '16 at 15:24

Ipor Sircer

14,376
1
27
34

1

This would fail if any of the filenames contain a newline – Eric Renouf Sep 08 '16 at 16:50
Yes, you're right! – Ipor Sircer Sep 08 '16 at 16:56

Miati · Answer 4 · 2016-09-09T01:28:20.510

0

You can do this with find and parallel

parallel -0 'gzip -cd '{}' | xz > '{.}'.xz; rm '{}'' < <(find basedir -iname \*gz -print0)

Steps completed:

recursive find all files ending in gz (case-insensitive)
Stdin from process substitution
parallel gzip foo.gz | xz > {foo}.xz; rm foo.gz
- {.} removes the .gz from foo.gz (in my understanding)

edited Sep 09 '16 at 01:28

answered Sep 08 '16 at 16:19

Miati

3,080
4
19
24

I was unable to create a file called **a$(touch /tmp/could_have_been_worse)b** so I cannot test that (likely due to forward slashes). I decided to test with **block1.gz** **blo*(&%&^$ %(* %&@*()#& %)(*#ck5.gz** and **block$(touch couldbeworse).gz**. I did encounter some "*No such file*" errors. After I replaced **gzip -cd "{}"** with **gzip -cd '{}'** this no longer occurred and I had similar .xz files. I did the same for **{.}** and the other **{}** for good measure. – Miati Sep 09 '16 at 00:35
@Gilles You raise good points. For fun, I changed one of the filenames to binary_garbage.gz with cp file "$(cut -b1-20 < file.gz)". This worked fine. Next I attempted each tr character (\v \t \n etc). This caused failure. I added -0 and -print0 to parallel and find. These all now *work fine*. My terminal and gui file manager are confused of course. If you can identify what will cause it to fail, let me know. However based on my testing, I think this command is *very* robust (now). – Miati Sep 09 '16 at 01:27
I think your parallel invocation is correct now, because parallel quotes special characters for interpolation in a shell script: With (non-ancient) parallel, unlike find or xargs, `{}` is replaced by the item with backslashes added, not by the actual item. Note that what you wrote is equivalent to `'gzip -cd '{} | xz > {.}.xz; rm {}'` — all these extra single quotes are redundant in the outer shell, and the shell executed by `parallel` doesn't have any quotes except the ones put there by `parallel` — and this is important: if you put quotes around `{}`, that conflicts with what parallel does. – Gilles 'SO- stop being evil' Sep 09 '16 at 12:20

How to convert all files from gzip to xz on the fly (and recursively)?

4 Answers4

Linked