40

bzip2 had been a de facto standard for quite strong compression throughout many years already. I myself had typed the bzip2 command thousands of times so far, which makes me wonder - what happened to bzip, or bzip1? Google doesn't seem to tell me much about it and it sounds like it could be an interesting history lesson.

d33tah
  • 1,361
  • 14
  • 28

1 Answers1

33

It seems that the original bzip was pulled circa 1998 due to patent issues with the arithmetic compression used in. A bit of digging (really only reading Wikipedia) turns up an archived link to the bzip2 website from around this time.

Here is the relevant section detail this and other differences:

How does it relate to your previous offering (bzip-0.21) ?

bzip2 is a rewritten and re-engineered version of 0.21. It looks superficially fairly similar, but has been almost entirely re-written (several times :-). The important differences are:

  • Patent-free! (I hope; see statement above). bzip-0.21 used arithmetic coding; bzip2 uses Huffman coding, which is generally regarded as non-problematic from a patent standpoint. Both programs are based on the Burrows-Wheeler transform, but, to the best of my knowledge, that's not patented either.

  • Faster, particularly at decompression. bzip2 decompresses more than 50% faster than 0.21, mostly because of the use of Huffman coding. I've also improved the compression speed, although not that much -- perhaps it compresses 30% faster than 0.21.

  • Recovery from media errors. Both programs compress data in blocks, by default, 900k long. With bzip2, each block is handled completely independently, carries its own checksum, and is delimited by a 48-bit sequence. So, if you have a damaged compressed file, bzip2 can extract the compressed blocks, detect which ones are undamaged, and decompress those.

  • Test mode. You can test integrity of compressed files without having to decompress them. I should have put this in 0.21, really, but was too lazy (+ burnt-out with hacking by the time I released it).

  • Handles very repetitive files much better. Such files are a worst-case for any block-sorting compressor. bzip2 runs approximately ten times faster than 0.21 for such files.

  • Support for smaller machines. bzip2 can decompress any file it creates in 2300k, which means you can decompress files on 4-meg machines. Peak memory use during compression is also reduced by about 900k compared with 0.21, to around 6400k.

  • Better flag handling. In particular, long flags (--like --this) are supported, which makes it easier to use.

  • The one-line startup message which 0.21 printed, is gone. This was 0.21's most complained-about feature. It even bugs me nowadays.

I'm no longer distributing 0.21, because doing so perpetuates problems with patents, which ensures that the program will never be widely used. That's a shame, because it's a useful program, and lots of people seem to like it. If you use 0.21 already, please upgrade to bzip2. I can't, unfortunately, make bzip2 be able to decompress 0.21's .bz files, since that would render the patent-avoidance exercise pointless. I know changing file formats is painful; from now on, I'll try and make any further changes in a backwards compatible way.

The is also a link to a decompression only version of the bzip source code for anyone wanting to play with it.

Graeme
  • 33,607
  • 8
  • 85
  • 110
  • 5
    I probably have original bzip on some backup archives somewhere. I used to use it extensively, and found the switch to bzip2 annoying. On most files, bzip obtained a better compression ratio. – Jules Apr 22 '14 at 04:12
  • See also: https://bsdforge.com/projects/archivers/bzip/ and http://aminet.net/package/util/arc/Bzip-0.21 – Mikko Rantalainen Feb 21 '18 at 12:21