18

I have a compressed raw image of a very large hard drive created using cat /dev/sdx | xz > image.xz. However, the free space in the drive was zeroed before this operation, and the image consists mostly of zero bytes. What's the easiest way to extract this image as a sparse file, such that the blocks of zeroes do not take up any space?

ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • 10
    Just FYI, that's a _useless use of cat_. You can get exactly the same behaviour with `xz < /dev/sdx > image.xz`. – TooTea Jan 04 '22 at 09:45
  • When running as root, yes. I stripped as much irrelevant detail from the command as I could, perhaps too much; typically I use `sudo cat` to read something that requires root access, while still running the programs being piped into as ordinary user. – jaymmer - Reinstate Monica Jan 06 '22 at 02:38

2 Answers2

30

Citing the xz manpage (which you really should consult with such questions), in which I very quickly searched for sparse:

--no-sparse
Disable creation of sparse files. By default, if decompressing into a regular file, xz tries to make the file sparse if the decompressed data contains long sequences of binary zeros. It also works when writing to standard output as long as standard output is connected to a regular file and certain additional conditions are met to make it safe. Creating sparse files may save disk space and speed up the decompression by reducing the amount of disk I/O.

(emphasis mine)

So, you don't have to do anything; just decompress with the default xz tool.

Marcus Müller
  • 21,602
  • 2
  • 39
  • 54
  • 7
    One addendum: The filesystem you are decompressing to must support sparse files. Most widely used filesystems on popular UNIX-like systems do these days, but if extracting to, say, a flash drive or SD card you can’t count on the filesystem having proper support. – Austin Hemmelgarn Jan 04 '22 at 15:15
  • 1
    that's very true! But then "extract to a sparse file" simply can't work, no matter the method. – Marcus Müller Jan 04 '22 at 15:19
12

The dd command has a conv=sparse

   sparse try to seek rather than write the output for NUL input blocks

So I would attempt

xz -dc < image.xz | dd of=image conv=sparse

Using dd in this way will work with any form of input (whether or not the first command could generate sparse files itself).

Stephen Harris
  • 42,369
  • 5
  • 94
  • 123
  • 4
    no sense in doing that. `xz` does that by itself. – Marcus Müller Jan 03 '22 at 12:44
  • Not only is there no sense in that, but if `conv=sparse` had any effect then the result would be wrong. `xz` needs to receive the full data in order to represent it in the compressed output (which it will do compactly). If instead `dd` skipped all-zero regions then the original image could not be correctly reconstituted from the compressed file. However, although the manual page is not specific about this, I am inclined to think that `conv=sparse` would have no effect when the output is connected to a pipe, which is unseekable. – John Bollinger Jan 04 '22 at 14:56
  • 2
    @JohnBollinger I read that `man` excerpt the other way, as in it modifies the way `dd` writes output: if `dd` gets a `NUL` input block, it doesn't `write()` it but just `lseek()`'s to the position the *next* block will be written to. That should create a sparse file regardless of the input type as long as the output blocks align properly to filesystem blocks. – Andrew Henle Jan 04 '22 at 22:37
  • 1
    @JohnBollinger: `dd`'s output *isn't* connected to a pipe here; this answer is suggesting piping decompressed `xz` output (with literal zeros) into `dd of=image`, for dd to find the zeros and seek in the output `image` file it created. This works in general, it's just not needed in this case because `xz` will do that itself when writing to a seekable file. (Err, to one it created I guess, rather than its stdout on an already-existing file with possible non-zero contents) – Peter Cordes Jan 05 '22 at 03:51