Make grep work for special filenames

Question

I have a set of txt files whose names may contain space or special characters like #.

I have a grep solution grep -L "cannot have" $(grep -l "must have" *.txt) to list all the files who have must have but not cannot have.

For instance, there is a file abc defg.txt which contains only 1 line: must have.

So normally the grep solution should find out abc defg.txt, but it returns:

grep: abc: No such file or directory
grep: defg.txt: No such file or directory

I think for filenames containing #, the grep solution is also invalid.

Could anyone help me amend the grep solution?

`#` would not be a problem (except with `zsh -o globsubst -o extendedglob`), but `*`, `?`, `[`, space, tab, newline would be with `bash`. — Stéphane Chazelas, May 08 '14 at 08:46
`find . -type f -name \*.txt -exec grep -qF 'must have' {} \; ! -exec grep -qF 'cannot have' {} \; -print` — don_crissti, Aug 19 '16 at 11:21

Stéphane Chazelas · Answer 1 · 2016-08-19T06:31:31.853

Since you're already using GNU specific options (-L), you could do:

grep -lZ -- "must have" *.txt | xargs -r0 grep -L -- "cannot have"

The idea being to use -Z to print the list of file names NUL-delimited and use xargs -r0 to pass that list as arguments to the second grep.

Command substitution, by default, splits on space, tab and newline (and NUL in zsh). Bourne-like shells other than zsh also perform globbing upon each word resulting of that splitting.

You could do:

IFS='
' # split on newline only
set -f # disable globbing
grep -L -- "cannot have" $(
    set +f # we need globbing for *.txt in this subshell though
    grep -l -- "must have" *.txt
  )

But that would still break on filenames containing newline characters.

In zsh (and zsh only), you can do:

IFS=$'\0'
grep -L -- "cannot have" $(grep -lZ -- "must have" *.txt)

Or:

grep -L -- "cannot have" ${(ps:\0:)"$(grep -lZ -- "must have" *.txt)"}

What's the point in specifying `*.txt` as the argument after disabling globbing? — devnull, May 08 '14 at 08:49

score 2 · Answer 2 · answered May 08 '14 at 10:48

2

IF you're willing to go further afield, awk can do it in one pass:

awk 'function s(){if(a&&!b){print f}} FNR==1{s();f=FILENAME;a=b=0} 
  /must have/{a=1} /cannot have/{b=1} END{s()}' filepattern

For recentish gawk you can simplify with BEGINFILE and ENDFILE. (Like all awk answers you can put the awk commands in a file with -f, and like most you can easily convert to perl if you prefer.)

answered May 08 '14 at 10:48

dave_thompson_085

3,790
1
16
16

1

Note however that `grep -l/L` stops reading at the first match so is likely to be more efficient (also because of the general `awk` code interpretation overhead). With GNU `awk`, you could use `nextfile` to avoid reading the whole file when it can be avoided (when `cannot have` is found). – Stéphane Chazelas May 08 '14 at 11:24

score -1 · Answer 3 · answered May 08 '14 at 08:20

-1

Consider using find instead and grep using a shell command:

find . -name '*.txt' -print0 | xargs -0 -I{} sh -c 'grep -q "must have" -- "{}" && grep -L "cannot have" -- "{}"'

answered May 08 '14 at 08:20

devnull

10,541
2
40
50

1

never embed `{}` in the shell code! – Stéphane Chazelas May 08 '14 at 08:31
@StéphaneChazelas I trust that is a meaningful rule, but could you give a hint to find an explanation? – Volker Siegel Aug 18 '16 at 16:53
@VolkerSiegel, that's a classic code injection vulnerability, it's on the level of passing unsanitized data to `eval`, as here, you're passing arbitrary file names as **code** to `sh` (think of a `$(reboot).txt` lurking in the directory for instance). You'll find it discussed in many Q&As here. See for instance [Do I need to encapsulate awk variables in quotes in order to sanitize them?](http://unix.stackexchange.com/a/113799) – Stéphane Chazelas Aug 19 '16 at 06:30
@StéphaneChazelas Ah, makes sense - I was just thinking about something directl related to `xargs`, because `xargs` option `-i` without argument defaults to `-I{}` ( the `{}` will not be part of the shell code) – Volker Siegel Aug 19 '16 at 12:41

Make grep work for special filenames

3 Answers3