40

I have this list of pdf files in a directory:

c0.pdf   c12.pdf  c15.pdf  c18.pdf  c20.pdf  c4.pdf  c7.pdf
c10.pdf  c13.pdf  c16.pdf  c19.pdf  c2.pdf   c5.pdf  c8.pdf
c11.pdf  c14.pdf  c17.pdf  c1.pdf   c3.pdf   c6.pdf  c9.pdf

I want to concatenate these using ghostscript in numerical order (similar to this):

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf *.pdf

But the shell expansion order does not reproduce the natural order of the numbers but the alphabetical order:

$ for f in *.pdf; do echo $f; done
c0.pdf
c10.pdf
c11.pdf
c12.pdf
c13.pdf
c14.pdf
c15.pdf
c16.pdf
c17.pdf
c18.pdf
c19.pdf
c1.pdf
c20.pdf
c2.pdf
c3.pdf
c4.pdf
c5.pdf
c6.pdf
c7.pdf
c8.pdf
c9.pdf

How can I achieve the desired order in the expansion (if possible without manually adding 0-padding to the numbers in the file names)?

I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case.

moooeeeep
  • 1,293
  • 4
  • 15
  • 19

5 Answers5

38

Once more, zsh's glob qualifiers come to the rescue.

echo *.pdf(n)
Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175
22

Depending on your environment you can use ls -v with GNU coreutils, e.g.:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls -v)

Or if you are on recent versions of FreeBSD or OpenBSD:

gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH -sDEVICE=pdfwrite \
   -sOutputFile=out.pdf $(ls | sort -V)
Thor
  • 16,942
  • 3
  • 52
  • 69
  • 1
    `ls -v` will `natural sort of (version) numbers within text` so that can be used as well... – Sundeep Oct 03 '16 at 12:21
  • @Sundeep: Indeed, but this seems to be a GNU coreutils only solution. – Thor Oct 03 '16 at 14:11
  • yeah, seems like GNU specific - http://pubs.opengroup.org/onlinepubs/9699919799/ – Sundeep Oct 03 '16 at 14:21
  • 1
    @Sundeep: The `-V` feature of `sort` is not specified by POSIX either. However, it seems to have spread farther, for example both FreeBSD and OpenBSD `sort` support it. – Thor Oct 03 '16 at 14:25
  • oh ok, can you add these details to answer as well? I came across this answer while searching for similar problem (glob in numerical order) and seeing `ls` used I checked out if it had option by itself instead of piping to sort :) – Sundeep Oct 03 '16 at 14:30
  • NEVER parse ls! Use `stat -c "%n" *` instead. – Peter Sep 05 '17 at 08:33
  • @Peter: In general I agree, but there are exceptions – Thor Sep 06 '17 at 11:50
  • 1
    and also I would change my comment above since `stat` with `%n` is not really best either due to whitespace being allowed in filenames...use `printf '%s\0'`, and things like `xargs -0` or `while read`... I wrote an answer that has that. – Peter Sep 06 '17 at 15:22
18

If all the files in question have the same prefix (i.e., the text before the number; c in this case), you can use

gs  …args…  c?.pdf c??.pdf

c?.pdf expands to c0.pdf c1.pdfc9.pdfc??.pdf expands to c10.pdf c11.pdfc20.pdf (and up to c99.pdf, as applicable).  While each command-line word containing pathname expansion character(s) is expanded to a list of filenames sorted (collated) in accordance with the LC_COLLATE variable, the lists resulting from the expansion of adjacent wildcards (globs) are not merged; they are simply concatenated.  (I seem to recall that the shell man page once stated this explicitly, but I can’t find it now.)

Of course if the files can go up to c999.pdf, you should use c?.pdf c??.pdf c???.pdf.  Admittedly, this can get tedious if you have a lot of digits.  You can abbreviate it a little; for example, for (up to) five digits, you can use c?{,?{,?{,?{,?}}}}.pdf.  If your list of filenames is sparse (e.g., there’s a c0.pdf and a c12345.pdf, but not necessarily every number in between), you should probably set the nullglob option.  Otherwise, if (for example) you have no files with two-digit numbers, you would get a literal c??.pdf argument passed to your program.

If you have multiple prefixes (e.g., a<number>.pdf, b<number>.pdf , and c<number>.pdf , with numbers of one or two digits), you can use the obvious, brute force approach:

a?.pdf a??.pdf b?.pdf b??.pdf c?.pdf c??.pdf

or collapse it to {a,b,c}?{,?}.pdf.

  • 2
    This is the best answer because it's beyond any claims of sketchy use of `ls`, `stat`, or anything else; and also works in bash as requested. – Kyle Aug 13 '19 at 19:36
5

If there are no gaps, the following could prove helpful (albeit sketchy and not robust concerning edge-cases and generality) -- just to get an idea:

FILES="c0.pdf"
for i in $(seq 1 20); do FILES="${FILES} c${i}.pdf"; done
gs [...args...] $FILES

If there may be gaps, some [ -f c${i}.pdf ] check could be added.

Edit also see this answer, according to which you could (using Bash) use

gs [..args..] c{1..20}.pdf
sr_
  • 15,224
  • 49
  • 55
  • It is generally a good idea to quote your shell variable references (e.g., `"$FILES"` and `"$i"`) unless you have a good reason not to, and you’re sure you know what you’re doing.  (By contrast, while braces can be important, they’re not as important as quotes,  so, for example, `"c$i.pdf"` is good enough.)  A command like `gs  [ ` *`…args… `* `]  $FILES`, where `$FILES` contains a space-separated list of files, may seem like a good reason to use `$FILES` without quoting it (because `"$FILES"` won’t work in that context).  … (Cont’d) – G-Man Says 'Reinstate Monica' Oct 04 '16 at 20:19
  • (Cont’d) …  But see [Security implications of forgetting to quote a variable in bash/POSIX shells](http://unix.stackexchange.com/q/171346/80216), in particular, [my answer to it](//unix.stackexchange.com/q/171346/80216#286350), for notes on how to handle multi-word variables as arrays in bash (e.g., `FILES=("c0.pdf")` and `FILES+=("c$i.pdf")`); also [this answer](//unix.stackexchange.com/q/310361/80216#310364), which uses the technique I suggest. – G-Man Says 'Reinstate Monica' Oct 04 '16 at 20:27
2

Just quoting and fixing Thor's answer... NEVER parse ls!

You can use sort -V (a non-POSIX extension to sort):

printf '%s\0' ./* | sort -zV \
    | xargs -0 gs -q -sPAPERSIZE=a4 -dNOPAUSE -dBATCH \
        -sDEVICE=pdfwrite -sOutputFile=out.pdf

(for some commands, apparently for gs is such a command, you need "./" instead of ""... if one doesn't work, try the other)

Peter
  • 1,227
  • 10
  • 9
  • 1
    The _don't parse ls output_ is because ls displays the file names newline-separated while newline is as valid as any in a file name, but here you're doing the same thing with `stat` but adding several other issues (like problems with filenames starting with `-`, problem if there are too many files, `stat` being a non-portable command). And because you used the split+glob operator without adjusting IFS or disabling globs, you'll still have issues with filenames with space or tab or wildcard characters. – Stéphane Chazelas Sep 05 '17 at 08:44
  • To use GNU `sort -V` reliably, you'd need `${(z)"$(printf '%s\0' * | sort -zV)"}` in `zsh` (though `zsh` has `(n)` for numerical sort already) or `readarray -td '' files < <(printf '%s\0' * | sort -zV)` in `bash4.4+`. – Stéphane Chazelas Sep 05 '17 at 08:47
  • @StéphaneChazelas thanks, and you are right that newline can be a concern, but that isn't the only reason not to parse ls. And yeah I was lazy and didn't add -- either. But I should have used printf...I'll change that. – Peter Sep 05 '17 at 10:06
  • for `ls` alone (that is without -l), what are those _other concerns_? Note that `--` wouldn't help for a file called `-`. – Stéphane Chazelas Sep 05 '17 at 10:08
  • @StéphaneChazelas there are other differences between versions... like some print "total 0" on there, and the newest ls versions even stick quotes around things where you don't want them... `touch \"test\"; ls -1` for example shows `'"test"'` on my ls. It's simply not meant to be parsed... it's a user interface, not a scripting command. – Peter Sep 05 '17 at 10:11
  • the `total x` is only for `ls -l/n...`. The quoting is only for output to a terminal (not a pipe like here). For a POSIX compliant `ls`, the only problem would be the newlines. But `-v` is not a POSIX option anyway. Now, I've just realised that `busybox ls` now also supports `ls -v` and busybox `ls` is one of those implementations that are not POSIX compliant as it does some mangling even when stdout is not a terminal. – Stéphane Chazelas Sep 05 '17 at 10:17
  • `*` -> `./*` to avoid problems with some file names with `gs`. – Stéphane Chazelas Sep 05 '17 at 10:19
  • Note also the OP's comment "I've found suggestions to use ls | sort -V, but I couldn't get it to work for my specific use case." – Jeff Schaller Sep 05 '17 at 12:59