Why does piping `tar` into `dd` not stop until the disk is full?

Question

I have a tar archive of a single disk image. The image inside this tar file is about 4GB in size. I pipe the output of tar xf into dd to write the disk image to an SD card. The diskdump never stops until the card is full. Here is my shell session:

$ ls -l disk.img.tgz
-rw-r--r-- 1 confus confus 192M Okt  5 00:53

$ tar -tvf disk.img.tgz
-rw-r--r-- root/root 4294968320 2018-10-05 00:52 disk.img

$ lsblk -lb /dev/sdc
NAME MAJ:MIN RM        SIZE RO TYPE MOUNTPOINT
sdc    8:32   1 16022241280  0 disk

$ tar zxf disk.img.tgz -O | sudo dd status=progress conv=sync bs=1M of=/dev/sdc
[sudo] password for user: 
15992881152 bytes (16 GB, 15 GiB) copied, 212 s, 75,4 MB/s 
dd: error writing '/dev/sdc': No space left on device
0+15281 records in
15280+0 records out
16022241280 bytes (16 GB, 15 GiB) copied, 217,67 s, 73,6 MB/s

Why? It should stop after hit has written the 4GB image to the 16GB cart and never run out of space!

Do you have the disk space to try running this through `dd` and writing it to another file? `tar zxf disk.img.tgz -O | dd status=progress conv=sync bs=1M of=/path/to/some/file/on/disk` ? If so, does that get you an exact copy of the original file? — Andy Dalton, Oct 05 '18 at 14:24
Why do you have `conv=sync`? Did you mean to use `conv=fsync` perhaps? — Ralph Rönnquist, Oct 05 '18 at 14:34
Are you certain that's the true size of the file? I know gzip only has 32 bits in which to store file sizes, so it gets the size of files over 4GB wrong. I'm not sure if tar has a similar limitation. — David Conrad, Oct 05 '18 at 15:35

frostschutz · Answer 1 · 2018-10-05T22:11:47.837

It's because you're doing it wrong.

You're using bs=1M but reading from stdin, pipe, will have smaller reads. In fact, according to dd, you didn't get a single full read.

And then you have conv=sync which complements incomplete reads with zeroes.

0+15281 records in
15280+0 records out

dd received 0 full and 15281 incomplete reads, and wrote 15280 full blocks (conv=sync zero filled). So the output is much much larger than the input, until you get no space left.

   sync   pad  every  input  block  with  NULs to ibs-size; when used with
          block or unblock, pad with spaces rather than NULs

To solve this, you could remove conv=sync and add iflag=fullblock.

To illustrate, consider yes, which by default spews out infinite "y\ny\ny\n".

$ yes
y
y
y
^C
$ yes | hexdump -C
00000000  79 0a 79 0a 79 0a 79 0a  79 0a 79 0a 79 0a 79 0a  |y.y.y.y.y.y.y.y.|
*

With dd bs=1M conv=sync it looks like this:

$ yes | dd bs=1M conv=sync | hexdump -C
00000000  79 0a 79 0a 79 0a 79 0a  79 0a 79 0a 79 0a 79 0a  |y.y.y.y.y.y.y.y.|
*
0001e000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00100000  79 0a 79 0a 79 0a 79 0a  79 0a 79 0a 79 0a 79 0a  |y.y.y.y.y.y.y.y.|
*
00112000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*

So it gets an incomplete block of "y\ny\ny\n" (0x00000 - 0x1e000, 122880 Bytes) then writes the remaining 1M as zeroes (0x01e000 - 0x100000, 925696 Bytes). In most cases, you don't want for this to happen. The result is random anyhow as you have no real control over how incomplete each read would turn out to be. Like here the second read is no longer 122880 Bytes but 73728 Bytes.

dd conv=sync is rarely useful and even in cases where it would be welcome, like writing zeroes when you get read errors, things will go horribly wrong with it.

In this case, running the `dd` command under `strace` (assuming Linux) would have shown that each short read from the pipe was followed by a full 1MB write. — Andrew Henle, Oct 05 '18 at 21:22
@AndrewHenle don't even need strace for this, just looking at the output will do. Added an illustration — frostschutz, Oct 05 '18 at 22:12
This also illustrates why the `dd` command is fundamentally broken and unusable. It's specified to operate in individual `read`s and `write`s, but those operations are specified such that they can always produce short reads or writes, and it's not an error. As a consequence, the behavior of `dd` depends on unspecified behavior. — R.. GitHub STOP HELPING ICE, Oct 06 '18 at 15:29
Thanks for the very educational answer. As someone else suggested I was being an ass and mixed up the many options to `dd`, but it lead me to learn something from you. What I'm still not completely sure about is, if and when `dd` would have terminated. I assume, it would have, but since it was actually writing 1 part actual data and 9 part zeros, it would have stopped after writing about 40G. Is that correct? — con-f-use, Oct 06 '18 at 16:15
@R.., that feature is very much useful with device drivers that care about the block size of reads and writes. I remember using some tape drives that cared about it. Though in this case, it's obviously not necessary, one could just redirect directly to the disk (not getting a live progress report, though) — ilkkachu, Oct 07 '18 at 08:53

Why does piping `tar` into `dd` not stop until the disk is full?

1 Answers1