19

The problem is I have some database dumps which are either compressed or in plain text. There is no difference in file extension etc. Using zcat on uncompressed files produces an error instead of the output.

Is there maybe another cat sort of tool that is smart enough to detect what type of input it gets?

rsk82
  • 293
  • 2
  • 7
  • What file extension are they? Is there any way I could get some examples to play around with? – Seth May 25 '14 at 16:33
  • Just use `zcat` on only the compressed dumps. – mikeserv May 25 '14 at 16:36
  • Yea, but I don't know which ones are compressed without manually checking, some of them are *.gz some not and that is the problem. May I rephrase the question, how to check if file is gzipped ? and use that information in next command ? – rsk82 May 25 '14 at 16:39
  • @rsk82 But you just said they all have the same extension.. So you mean they all have .gz but only some of those are actually compressed? The others are just plain text? – Seth May 25 '14 at 16:41
  • Well - that sounds like your problem. Whatever system you've setup that provides that kind of output needs revising. In the meantime, GNU `grep` can be instructed what to do if it encounters a binary type file - and that might make a good filter for the cleanup. – mikeserv May 25 '14 at 16:45

5 Answers5

26

Just add the -f option.

$ echo foo | tee file | gzip > file.gz
$ zcat file file.gz
gzip: file: not in gzip format
foo
$ zcat -f file file.gz
foo
foo

(use gzip -dcf instead of zcat -f if your zcat is not the GNU (or GNU-emulated like in modern BSDs) one and only knows about .Z files).

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • One correction: you need the `c` parameter for `gzip` to get the output to stdout: `gzip -cdf`. But `zcat -f` works also fine in Mac OS if you use stdin: `zcat -f < file`. – David Ongaro May 25 '14 at 23:58
  • Addendum: by using stdin `zcat` can only take one argument which kind of defeats the purpose of `cat`, so I think using `gzip` is the best solution (in fact `zcat` is just a hardlink to `gzip` but it seems to be more picky about the file extension when called as `zcat` in Mac OS). – David Ongaro May 26 '14 at 00:24
10

One portable, simple suggestion would be to use zgrep instead of zcat, and just use a search pattern that matches every line.

zgrep $ some-file

Unlike zcat, zgrep will happily handle uncompressed files. From man zgrep:

zgrep - search possibly compressed files for a regular expression
godlygeek
  • 7,963
  • 1
  • 28
  • 28
  • What do you mean by *portable*? If you mean that it can be installed and used on any system then isn't is as portable as any other? – mikeserv May 25 '14 at 17:16
  • I mean that zcat and zgrep are normally packaged together, so this ought to work anywhere where zcat was available to begin with. And it's agnostic to the particular shell being used - ought to work fine in bash, zsh, and even csh or Solaris's non-POSIX bourne /bin/sh. – godlygeek May 25 '14 at 17:22
  • Oh, cool. I noticed I had both - but I didn't know they came together. Thanks. You've got my vote. – mikeserv May 25 '14 at 17:26
  • 2
    `zgrep` is a script that wraps around `gzip` and `grep`. The reason it works with uncompressed data is because [it passes the `-f` option to `gzip`](http://unix.stackexchange.com/a/131944/22565) – Stéphane Chazelas May 25 '14 at 20:13
  • Also note that as far as portability goes, `zcat` was first only dealing with `.Z` files. GNU came up with `gzip` and the `gz` format later and its gzip/zcat handles both `.Z` and `.gz` files. You'll probably still find commercial Unices where `zcat` knows nothing about gz files. – Stéphane Chazelas May 25 '14 at 20:16
5

With GNU gzip you can do zcat file 2> /dev/null || cat file. This is not POSIX-standard, and does not work on BSD gzip, you really should fix your system so that all gzipped files have the .gz extension (of course plain text files may have any extension, including .gz).

fkraiem
  • 554
  • 3
  • 8
  • You should add an `|| { echo "$filename" ; cat $file ; } >&2` to cat to keep the out streams separate so the asker can more easily clean up the mess. I mean - well, I hope that's clear enough.... But this is a good answer. – mikeserv May 25 '14 at 16:58
2

To add my conclusion from the comments as an answer, I think the best most compatible way is to use

gzip -cdf [ name ... ]

This is also how zless and zgrep do it internally.

David Ongaro
  • 459
  • 4
  • 13
0

An alternative in Bash:

for filename in *;do #Replace with your actual loop
    case $filename in
    *.gz) gunzip <"$filename";;
    *)    cat "$filename";;
    esac
done
Joseph R.
  • 38,849
  • 7
  • 107
  • 143