I have an ext4 formatted disk with thousands of files that are generated automatically and are needed. A few thousand of them are only one byte long, some two bytes. All files in both groups of tiny files are identical.
How much space can I save by locating these, say 1000, files of 1 byte in length, removing each and hard-linking to a single representative file?
Like this:
# ls -l
-rw-r----- 1 john john 1 Feb 25 10:29 a
-rw-r----- 1 john john 1 Feb 25 10:29 b
-rw-r----- 1 john john 1 Feb 25 10:29 c
# du -kcs ?
4 a
4 b
4 c
12 total
Try to consolidate:
# rm b c
# ln a b
# ln a c
ll
total 12
-rw-r----- 3 john john 1 Feb 25 10:29 a
-rw-r----- 3 john john 1 Feb 25 10:29 b
-rw-r----- 3 john john 1 Feb 25 10:29 c
# du -kcs ?
4 a
4 total
(Please note that du does not even list b and c which I find curious).
Question: Is it really that easy and one can save 999*4 KiB in my 1000 file scenario if an allocation block is 4 KiB in size?
Or, does ext4 have the ability to transparently "merge tails", or store tiny files in the "directory inode" (I vaguely remember some filesystems can do that)?
(I know file allocation blocks can vary and a command like tune2fs -l /dev/sda1 can tell me.)