How can I execute a command only if a certain file exceeds a defined size? Both should at the end run as a oneliner in crontab.
Pseudocode:
* * * * * find /cache/myfile.csv -size +5G && echo "file is > 5GB"
How can I execute a command only if a certain file exceeds a defined size? Both should at the end run as a oneliner in crontab.
Pseudocode:
* * * * * find /cache/myfile.csv -size +5G && echo "file is > 5GB"
If you have GNU stat, you can use its --printf option to get its size.
e.g.
size=$(stat --printf '%s' /cache/myfile.csv)
if [ "$size" -gt 5368709120 ] ; then # 5 GiB = 5 * 1024 * 1024 * 1024
echo "file is > 5GB"
fi
See man stat for details.
BSD's stat (e.g. on FreeBSD and on Mac) has a similar formatting option, -f:
size=$(stat -f '%z' /cache/myfile.csv)
Alternatively, you could use perl's built-in stat function, or its -s file test operator (which is similar to bash's -s file test but it returns the file's size rather than just true if it exists and is non-empty).
perl's stat function returns a 13-element list (array) of metadata about a file containing the following data (copied from perldoc -f stat):
[...] Not all fields are supported on all filesystem types. Here are
the meanings of the fields:
0 dev device number of filesystem
1 ino inode number
2 mode file mode (type and permissions)
3 nlink number of (hard) links to the file
4 uid numeric user ID of file's owner
5 gid numeric group ID of file's owner
6 rdev the device identifier (special files only)
7 size total size of file, in bytes
8 atime last access time in seconds since the epoch
9 mtime last modify time in seconds since the epoch
10 ctime inode change time in seconds since the epoch (*)
11 blksize preferred I/O size in bytes for interacting with the
file (may vary from file to file)
12 blocks actual number of system-specific blocks allocated
on disk (often, but not always, 512 bytes each)
(The epoch was at 00:00 January 1, 1970 GMT.)
Field 7 is the one we need.
To return the file's size (for later use in a shell command or script) using stat:
# stat
perl -e 'print scalar((stat(shift))[7])' /cache/myfile.csv
# -s
perl -e 'print -s shift' /cache/myfile.csv
Or to do it all in perl:
# stat
perl -e 'print "File is > 5 GiB\n" if (stat(shift))[7] > 5*1024*1024*1024' /cache/myfile.csv
# -s
perl -e 'print "File is > 5 GiB\n" if -s shift > 5*1024*1024*1024' /cache/myfile.csv
See perldoc -f stat and perldoc -f -X (as well as help test in bash).
BTW, perl's shift function removes the first element of an array (by default @ARGV, the array of command line args, if not specified) and returns its value. It's often used in a loop to process all elements of an array, but here we're only interested in the first arg (the filename). See perldoc -f shift for details, including notes on lexical scope and use in a subroutine.
To use the file size as a precondition you can use stat or find:
[ -n "$(find /cache/myfile.csv -prune -size +5G 2>/dev/null)" ] && echo "file is > 5GB"
Or if the target command (echo, here) is short, put it into the exec part of `find
find /cache/myfile.csv -prune -size +5G -exec echo "file is > 5GB" \;
The -prune is in case myfile.csv might be a file of type directory, to prevent find from descending into it.
If you need to treat files in a shell, both version only execute shell's command only if all conditions are met: is a file, is named myfile.csv and is > 5G:
find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
echo "$1 is > 5GB"
' bash {} \;
or
find /cache -name 'myfile.csv' -type f -size +5G -exec bash -c '
for file; do echo "$file is > 5GB"; done
' bash {} +
Note that some shells have the feature built-in.
SHELL=/bin/tcsh
* * * * * if (-Z /cache/myfile.csv > 5*1024*1024*1024) echo 'file is > 5GiB'
Or with zsh, here using glob qualifiers and an anonymous functions, though zsh also has a stat builtin that predates both GNU and BSD stat:
SHELL=/bin/zsh
* * * * * (){ if (($#)) echo 'file is > 5GiB'; } /cache/myfile.csv(NLG+5)
(note that like for find -size +5G, we're talking of gibibytes (1GiB = 1,073,741,824 bytes) here, not gigabytes (1GB = 1,000,000,000 bytes))
For symlinks, tcsh will get the size of the file it eventually resolved to while zsh's LG+5 qualifier like find's -size will check the size of symlink itself. Change to -LG+5 to check the size after symlink resolution. zsh's stat builtin gives you information after symlink resolution by default, -L to change that. In GNU and BSD stat, that's reversed. Same with find where -L tells it to follow symlinks.
For more ways to get the size of a file, see How can I get the size of a file in a bash script?