1

I retrieved all the pdfs in $HOME directory

$ find -E ~ -regex ".*/[^/].*.pdf"

It print more than 1000 files;
I intent to sort them by size and searched

$ stat -f '%z' draft.sh
184

I drafts the script:

#! /usr/local/bin/bash

OLD_IFS=IFS 
IFS=$'\n'

touch sorted_pdf.md

for file in $(find -E ~ -regex ".*/[^/].*.pdf")
do
    file_size=$(stat -c "%s" $file)
    ....

done > sorted_pdf.md

IFS=OLD_IFS

It's hard to work them together and get my result. Could you please provide any hint?

I refactored the code

#! /bin/zsh
OLD_IFS=IFS 
IFS=$'\n'

touch sorted_pdf.md

for file in $(find -E ~ -regex ".*/[^/].*.pdf")
do
    # file_size=$(stat -c "%s" $file)
    printf '%s\n' $file(DoL)

done > sorted_pdf.md

IFS=OLD_IFS

but get error report

$ ./sort_files.sh

./sort_files.sh: line 12: syntax error near unexpected token `('
./sort_files.sh: line 12: `    printf '%s\n' $file(DoL)'
Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
AbstProcDo
  • 2,453
  • 4
  • 23
  • 55
  • That looks like BSD `find` for `-E` and BSD `stat` for `-f %z`, but then the `stat -c %s` indicates GNU `stat`, what system is it? Do you have to use `bash` or can you use other shells like `zsh`? – Stéphane Chazelas Oct 26 '18 at 14:48
  • @StephenKitt, here there's an extra requirement that the file names end in `.pdf`. So it's different in that we need to sort files found by `find` (or other ways to find files by name). – Stéphane Chazelas Oct 26 '18 at 14:53
  • ty, bsd on macos, but the refering question does not answer my question. @StéphaneChazelas – AbstProcDo Oct 26 '18 at 14:55
  • no clue about BSD/macos, that is why I don't write an answer, but won't something like `find ... -printf '%s %P\n' | sort -n` work? – pLumo Oct 26 '18 at 15:07
  • `syntax error near unexpected token '('` is a `bash` message, but in any case, `$file(DoL)` wouldn't make sense in `zsh` either. That's meant to be a _glob_ qualifier to sort the glob expansion, so doesn't make sense when applied to a single file. – Stéphane Chazelas Oct 26 '18 at 15:08
  • @Stéphane the second answer on the duplicate I’d linked showed how to sort `find`’s output by size (using `-printf` on GNU `find`). – Stephen Kitt Oct 26 '18 at 15:09
  • @StephenKitt, `printf` is GNU specific. The OP is on macOS. – Stéphane Chazelas Oct 26 '18 at 15:09
  • @Stéphane which is also addressed alongside [the aforementioned answer](https://unix.stackexchange.com/a/88066/86440). (And when I closed the question, the macOS requirement wasn’t apparent — I agree as it stands currently, the question is better left open with your answer.) – Stephen Kitt Oct 26 '18 at 15:12
  • printf resides on BSD @StéphaneChazelas – AbstProcDo Oct 26 '18 at 15:13
  • @riderdragon, sorry I meant the `-printf` preficate of `find` is GNU-specific. The `printf` utility itself is standard. – Stéphane Chazelas Oct 26 '18 at 18:04

2 Answers2

2

To sort by size, you can use zsh's glob qualifiers (zsh is installed by default on macOS, it even used to be sh there):

#! /bin/zsh -
printf '%s\n' **/*.pdf(DoL)
  • **/ is recurse globbing
  • (DoL) is a glob qualifier, D to include dot files (hidden files) as find would, oL to sort the generated list by file Length.

Note that -regex ".*/[^/].*.pdf doesn't make much sense.

That matches for instance on /home/foo/pdf , .* on /home, then /, then [^/] on f then .* on oo, then . on / and then pdf.

With -regex, with or without -E, you can use -regex '.*\.pdf' to match on *.pdf files, but you might as well use the standard -name '*.pdf'.

You could use:

find . -name '*.pdf' -exec stat -f '%z %N' {} + |
  sort -n |
  cut -d ' ' -f 2-

But that wouldn't work if there were file paths with newline characters.

With GNU utilities, you could do:

find . -name '*.pdf' -printf '%s %p\0' |
  sort -nz |
  cut -zd ' ' -f 2- |
  tr '\0' '\n'

Note that if any of those pdf files are symlinks, it's the size of the symlink that is considered, not the size of the target of the symlink. To sort on the size of that target, change DoL to D-oL or add the -L options to stat. And with GNU find:

find -L . \( ! -xtype l -o -prune \) -name '*.pdf' -printf '%s %p\0' |
  sort -nz |
  cut -zd ' ' -f 2- |
  tr '\0' '\n'

For case-insensitive matching, either replace pdf with [pP][dD][fF] or replace -name with -iname (not standard but supported by GNU and BSD find), or for zsh, enabled the extendedglob option and change pdf to (#i)pdf or enable the nocaseglob option.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
1

If you have access to GNU find, awk:

$ find $HOME -iname "*.pdf" -printf '%s\0%p\n' | sort -h -t '\0' | awk -F '\0' '{print $2}'

This command:

  • finds all files in $HOME having (case insensitive) pdf extension and prints size and path for each one;
  • sorts the list by the first field using the -h option that enables human readable number comparison;
  • prints the sorted paths.
fra-san
  • 9,931
  • 2
  • 21
  • 42
  • 1
    `-printf` is a GNU `find` extension. It's not found in any other implementation yet. `-F '\0'` would typically not work with BSD `awk` either. – Stéphane Chazelas Oct 26 '18 at 17:39