0

I suspect the following has been answered already but I don't know the terminology for the issue I'm having well enough to find an existing answer.

I'm working on a command to go through a list of files and output on each line the filename followed by the count of lines that start with P. I've gotten this far:

find -type f | xargs -I % sh -c '{ echo %; grep -P "^P \d+" % | wc -l; }  | tr "\n" ","; echo ""; '

(The actual find command is a bit more involved but short story is it finds about 11k files of interest in the directory tree below where I'm running this)

This command is about 98% working for my purposes, but I discovered there is a small subset of files with parentheses in their names and I can't ignore them or permanently replace the parentheses with something else.

As a result I'm getting some cases like this:

sh: -c: line 0: syntax error near unexpected token `('

I know parentheses are a shell special character so for example if I was running grep directly on a single file with parentheses in the name I'd have to enclose the filename in single quotes or escape the parentheses. I tried swapping the quote types in my command (doubles outermost, singles inner) so I could put the '%' in the grep call in single quotes but that didn't help.

Is there a way to handle parentheses in the find -> xargs -> sh chain so they get handled correctly in the sh call?

SSilk
  • 133
  • 5
  • See also [Why is looping over find's output bad practice?](https://unix.stackexchange.com/q/321697) and [Is it possible to use \`find -exec sh -c\` safely?](https://unix.stackexchange.com/q/156008) – Stéphane Chazelas Apr 19 '23 at 12:55
  • If you're using `find`, you don't really need to use `xargs`, (almost) everything it does can be done with find's `-exec` predicate, especially when combined with sh or some other scripting language (`e.g. -exec sh -c '...'` or `-exec perl- e'...'`. xargs is more useful with other programs (i.e. programs-that-aren't-find). About the only time it's useful to use xargs with find is when you need parallel execution with xargs -P (and even then, there's often better tools for that job, like GNU [parallel](https://www.gnu.org/software/parallel/)) – cas Apr 19 '23 at 12:58
  • @cas, on that, see also [find . -print0 | xargs -0 cmd vs find . -exec cmd {} +](https://unix.stackexchange.com/q/730873) – Stéphane Chazelas Apr 19 '23 at 13:01
  • oh yeah, filtering/manipulating the file list with grep/sed/awk/perl/whatever (NUL-separated, of course) *between* `find ... -print0` and `xargs -0r` is also a good reason to use xargs instead of -exec. – cas Apr 19 '23 at 13:05

3 Answers3

4

Better not embed data (filenames) directly in code (the shell scriptlet). Instead pass the filename as an argument to the shell you have xargs run:

find -type f | xargs -I % \
  sh -c '{ echo "$1"; grep -c -P "^P \d+" "$1"; } | tr "\n" ","; echo' sh %

Also you should be able to use grep -c instead of grep | wc -l, it at least makes the command a bit shorter.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
ilkkachu
  • 133,243
  • 15
  • 236
  • 397
  • Thanks for the quick reply! I tried your approach and it's giving me the error `sh: -c: line 1: syntax error: unexpected end of file`. Am I missing something? – SSilk Apr 19 '23 at 12:44
  • 1
    @SSilk, there was a missing `;` before the `}` which I've added. – Stéphane Chazelas Apr 19 '23 at 12:46
  • @StéphaneChazelas Yes, just saw your edit after commenting. It's working for me now. Thanks! – SSilk Apr 19 '23 at 12:47
  • @ilkkachu Re: `grep -c`, thanks for the reminder. I was pretty sure grep had a built in counting function but when I searched for how to count matching lines with grep the first thread I found only showed piping grep matches into `wc -l`. So I just ran with that. – SSilk Apr 19 '23 at 12:48
  • Acknowledging that there are some good recommendations in other answers about using `find -exec` rather than ``xargs`, I'm accepting this answer as it directly answers the initial question of how to handle parentheses in filenames in specific scenario. – SSilk Apr 24 '23 at 07:38
3

Since you omitted the . in find . -type f, I suppose your find is GNU find, then you can do:

find . -type f -printf %p, -exec grep -cP '^P \d' {} ';'

If the file paths don't contain : characters, you could also do (with GNU grep):

grep -rcP '^P \d' . | tr : ,

If they may contain : characters but don't contain newline characters, that can be worked around by replacing only the last : in the line with ,:

grep -rcP '^P \d' . | LC_ALL=C sed 's/\(.*\):/\1,/'

That approach can also be used with:

find ... -type f -exec grep -cHP '^P \d' {} + | ...

If you still need to use find, for instance because you have more selection criteria.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • Thanks for the suggestions. I think the issue with this approach is my real find command is a bit involved. (Finds files, recursively, only those with numerical extensions, rejects those with spaces and certain special characters, but does allow a few special characters like parentheses. So not something I know how to do with grep's built in filename filtering.). It just didn't have much bearing on the question I was asking so I shortened it to what's shown above. – SSilk Apr 19 '23 at 12:57
  • @SSilk, you can still use `find` with that approach, see edit. – Stéphane Chazelas Apr 19 '23 at 13:00
0

ilkkachu's answer looks like a fine improvement and is probably what you should do.

Adding, for information purposes, a lighter touch fix to show where your problem lies:

find -type f | xargs -I % sh -c '{ echo "%"; grep -P "^P \d+" "%" | wc -l; }  | tr "\n" ","; echo ""; '

Basically -- quote wrap the % that will be replaced.

bxm
  • 4,561
  • 1
  • 20
  • 21
  • Even with quotes, that's still a command injection vulnerability (like with a `$(reboot)` file). The place holder should **never** be embedded in the code argument. – Stéphane Chazelas Apr 19 '23 at 12:45
  • Thanks for the tip. – bxm Apr 19 '23 at 12:47
  • Thanks for the info. These finer details of embedding place holders vs passing as arguments is new to me. I'll have to study up on it. The method I was using was demonstrated in the following post https://linuxize.com/post/linux-xargs-command/#:~:text=To%20run%20multiple%20commands%20with,the%20argument%20passed%20to%20xargs. – SSilk Apr 19 '23 at 12:52
  • Also: no need for `xargs` (and find's output is not compatible with xargs expected input format unless you use `-print0`/`-0`) as you can use `find`'s `-exec`. No need to run one `sh` per file as `sh` can loop over arguments. `echo` can't be used for arbitrary data. `See `paste -sd , -` to join lines with `,`. – Stéphane Chazelas Apr 19 '23 at 12:53