4

I’m trying to clean up all of my music folders (and there are a lot) - but rather than delete, I want to move the ones that are empty, or have only a few Mb, or have just a few files in them.

I’ve managed to use the following find command to move all the empty directories:

find . -empty -type d -exec mv {} /share/Container/beetsV2/music/my_empty_folders \;

... but I can't seem to work out how to find directories based on size. I thought the following would work, but it doesn’t; it seems to return way more than expected.

find . -size -5M -type d -exec mv {} /share/Container/beetsV2/music/my_folders_under_5Mb \;

When it comes to finding directories with only a few files in them, I can’t seem to find a single command line to do that like the above. Does one even exist?

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
nodecentral
  • 141
  • 1

2 Answers2

1

From man find:

-size n[cwbkMG] File uses less than, more than or exactly n units of space, rounding up.

This switch is not relevant for directories disk usage.


A workaround is to use bash to treat all directories from find output, to take the size of each of them and mv each if the size is less than N MiB:

move_dirs_smaller_than() {
    export MAX=$(($1*1024)) # $1 is in MiB
    export path=$2          # $2 /path must be outside of .
    find . -mindepth 1 -type d -exec bash -c '
        read size _ < <(du -sk "$1")
        ((size < MAX)) && echo mv -- "$1" "$path"
    ' bash {} \; -prune
}

Remove echo statement when the output looks good enough.

Function usage for 5MiB:

/path should NOT be in current directory (or use find ! ./dir):

move_dirs_smaller_than 5 /path

Look at find's man page about -prune, -mindepth and -maxdepth depending on your needs.

Gilles Quénot
  • 31,569
  • 7
  • 64
  • 82
0

The GNU implementation of du can give you a report of:

  • cumulative disk usage (default)
  • cumulative size with --apparent-size
  • cumulative number of files (of any type) with --inode

For directories and their contents (unfortunately, it can't include all three in one report).

You can also tell it not to deduplicate hard links with -l / --count-links.

It's one of the rare ones whose output you can post-process reliably as it's got a -0 / --null option to output NUL-delimited records.

So if you're on a GNU system, you could do:

xargs -r0a <(
  du --inode --null --count-links | # count inodes
    tac -s '' | # reverse the output so parents are shown before children
    perl -0lnse '
      if (
        m{^(\d+)\t(.*)}s &&
        $1 < $max &&
        rindex($2, "$last_moved/", 0) < 0 # not a subdirectory of the last moved
      ) {
        print($last_moved = $2);
      }' -- -max=10
  ) echo mv -it /path/to/destination --

Which would move the directories that contain (recursively) fewer than 10 files (of any type including directory, and including the directory itself).

Replace --inode with --apparent-size --block-size=1 (or -b for short) to consider the cumulative size instead of number of files. Same without --apparent-size for disk usage (you'll want to update -max accordingly).

Remove the echo if happy with the result to actually do it.

All of -r, -0, -a, --inode, --apparent-size, -l, --count-links, tac, -b, --block-size, -t are non-standard GNU extensions few of which have been added to other implementations of those standard utilities, so don't expect that to work outside GNU systems unless you've installed GNU coreutils and findutils there. Since you've used the linux tag though, there's a fair chance you are on GNU system.

To consider both number of files and cumulative size, you could use GNU find which has a -printf predicate which can report file type, and size and do the sums by hand:

xargs -r0a <(
  find . -depth -printf '%y %s %p\0' | perl -0lsne '
    if (m{^(\S+) (\d+) ((.*)/.*)}s) {
      my ($type, $size, $file, $parent) = ($1, $2, $3, $4);
      $count{$parent} += ++$count{$file};
      $size{$parent} += $size{$file} += $size;
      unshift @dirs, $file if $type eq "d";
    }
    END {
      for $dir (@dirs) {
        if (
          $size{$dir} < $max_size &&
          $count{$dir} < $max_count &&
          rindex($dir, "$last_moved/", 0) < 0
        ) {
          print $last_moved = $dir;
        }
      }
    }' -- -max_count=10 -max_size="$(( 5 * 1024 * 1024 ))"
  ) echo mv -it /path/to/destination --

For the disk usage instead of apparent size, replace %s with %b which you'll have to multiply by 512 (replace $2 with $2 * 512 above) to get the disk usage in bytes.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501