15

How do I output how much of file nominal size is actually filled with data? Like vmtouch shows how much of file is currently in memory...

I expect the workflow to be like this:

$ fallocate -l 1000000 data 
$ measure_sparseness data
100%
$ fallocate -p -o 250000 -l 500000  data
$ measure_sparseness
50%

Workaround: use du -bsh and du -sh and compare them.

Vi.
  • 5,528
  • 7
  • 34
  • 68
  • 1
    related: `filefrag` for any filesystem and `xfs_bmap -vpl` for XFS are key tools for showing where the data is (and where the pre-allocated unwritten extents are) when playing around with sparse files and hole-punching. – Peter Cordes Aug 07 '16 at 01:53
  • `filefrag data` -> multiple `FIBMAP: Invalid argument` -> `data: 1 extent found`... – Vi. Aug 07 '16 at 11:53
  • on what filesystem? `filefrag -e` works perfectly on XFS and ext4 at least. I haven't tested on others. It uses FIEMAP (extent-map), with a fallback to FIBMAP. If those `ioctl`s don't work, then it won't be useful. – Peter Cordes Aug 07 '16 at 15:07
  • On tmpfs. My `filefrag` doesn't have `-e` option. – Vi. Aug 07 '16 at 23:45
  • How old is your `e2fsprogs`? I'm pretty sure it's not a recent feature. There's [also a `-v` option](http://man7.org/linux/man-pages/man8/filefrag.8.html) which prints the same verbose info (plus some extra header lines). Maybe your `filefrag` will have that. Unlike `xfs_bmap`, though, it doesn't explicitly indicate holes with separate lines, it just has discontinuities in file position. Anyway, I'm not surprised that `tmpfs` doesn't support FIEMAP, because there is no block device as a backing store, so there's no sensible value for the location of the extents. – Peter Cordes Aug 08 '16 at 00:03
  • I wonder if tmpfs supports `lseek(SEEK_DATA)` and `lseek(SEEK_HOLE)`... That's another way to find the location of data vs. holes that doesn't rely on FIEMAP. – Peter Cordes Aug 08 '16 at 00:05
  • Related: [How to display the non-sparse parts of a sparse file?](http://unix.stackexchange.com/q/121592) – Stéphane Chazelas Aug 08 '16 at 06:36

3 Answers3

19

find has %S format specifier which is even named "sparseness"

         %S     File's  sparseness.   This  is  calculated as (BLOCKSIZE*st_blocks / st_size).  The exact value you will get for an ordinary file of a certain
                 length is system-dependent.  However, normally sparse files will have values less than 1.0, and files which use indirect  blocks  may  have  a
                 value which is greater than 1.0.   The value used for BLOCKSIZE is system-dependent, but is usually 512 bytes.   If the file size is zero, the
                 value printed is undefined.  On systems which lack support for st_blocks, a file's sparseness is assumed to be 1.0.
$ fallocate -l 1000000 data
$ find data -printf '%S\n'
1.00352
$ fallocate -p -o 250000 -l 500000  data
$ find data -printf '%S\n'
0.507904
Vi.
  • 5,528
  • 7
  • 34
  • 68
  • Interesting. Most regular files on a system will have sparseness above 1.0, directories, softlinks and sockets will always have exactly 1.0. – grochmal Aug 06 '16 at 22:36
  • Didn't some systems save (short) symlink directly in the inode, without using data blocks at all? Wonder what the sparseness of that should be. Besides, isn't that definition the wrong way around, surely a normal (i.e. non-sparse) file should have sparseness zero? :) – ilkkachu Aug 06 '16 at 23:24
  • @grochmal, on ext4 (Linux): `ln -s foo link`, "sparseness" of `link`: 0. Sockets and FIFOs have length zero, so `find` shows sparseness 1. – ilkkachu Aug 07 '16 at 10:55
1

If your find doesn't have that option, a method that has work on UNIX since the 70's is:

ls -ls file

Which will print the actual number of blocks used and the highest byte ever written. From that you can easily compute how many blocks actually have not been allocated.

MAP
  • 585
  • 2
  • 9
0

While find's %S will print a brief output, for more detail you might want to look at sparsetest which I wrote - open source, and on github here. Feel free to modify it if you want to print out (e.g.) every hole.

Blog article showing problems with sparse allocations here using sparsetest to debug the issue.

abligh
  • 397
  • 2
  • 11
  • Can it print a "map" of extents in a file, like `vmtouch -v` prints map of cached areas in the file? – Vi. Aug 07 '16 at 11:54
  • @Vi. I wrote it a good while ago and forgot some details - what it's actually doing is creating a sparse file, writing data to it, then printing statistics. You just want the statistic creating bit. To print holes you will need `lseek` with `SEEK_HOLE` and `SEEK_DATA`. Easy to do. – abligh Aug 07 '16 at 13:50