3

I have a set of txt files whose names may contain space or special characters like #.

I have a grep solution grep -L "cannot have" $(grep -l "must have" *.txt) to list all the files who have must have but not cannot have.

For instance, there is a file abc defg.txt which contains only 1 line: must have.

So normally the grep solution should find out abc defg.txt, but it returns:

grep: abc: No such file or directory
grep: defg.txt: No such file or directory

I think for filenames containing #, the grep solution is also invalid.

Could anyone help me amend the grep solution?

SoftTimur
  • 657
  • 1
  • 6
  • 6

3 Answers3

2

Since you're already using GNU specific options (-L), you could do:

grep -lZ -- "must have" *.txt | xargs -r0 grep -L -- "cannot have"

The idea being to use -Z to print the list of file names NUL-delimited and use xargs -r0 to pass that list as arguments to the second grep.

Command substitution, by default, splits on space, tab and newline (and NUL in zsh). Bourne-like shells other than zsh also perform globbing upon each word resulting of that splitting.

You could do:

IFS='
' # split on newline only
set -f # disable globbing
grep -L -- "cannot have" $(
    set +f # we need globbing for *.txt in this subshell though
    grep -l -- "must have" *.txt
  )

But that would still break on filenames containing newline characters.

In zsh (and zsh only), you can do:

IFS=$'\0'
grep -L -- "cannot have" $(grep -lZ -- "must have" *.txt)

Or:

grep -L -- "cannot have" ${(ps:\0:)"$(grep -lZ -- "must have" *.txt)"}
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
2

IF you're willing to go further afield, awk can do it in one pass:

awk 'function s(){if(a&&!b){print f}} FNR==1{s();f=FILENAME;a=b=0} 
  /must have/{a=1} /cannot have/{b=1} END{s()}' filepattern

For recentish gawk you can simplify with BEGINFILE and ENDFILE. (Like all awk answers you can put the awk commands in a file with -f, and like most you can easily convert to perl if you prefer.)

dave_thompson_085
  • 3,790
  • 1
  • 16
  • 16
  • 1
    Note however that `grep -l/L` stops reading at the first match so is likely to be more efficient (also because of the general `awk` code interpretation overhead). With GNU `awk`, you could use `nextfile` to avoid reading the whole file when it can be avoided (when `cannot have` is found). – Stéphane Chazelas May 08 '14 at 11:24
-1

Consider using find instead and grep using a shell command:

find . -name '*.txt' -print0 | xargs -0 -I{} sh -c 'grep -q "must have" -- "{}" && grep -L "cannot have" -- "{}"'
devnull
  • 10,541
  • 2
  • 40
  • 50
  • 1
    never embed `{}` in the shell code! – Stéphane Chazelas May 08 '14 at 08:31
  • @StéphaneChazelas I trust that is a meaningful rule, but could you give a hint to find an explanation? – Volker Siegel Aug 18 '16 at 16:53
  • @VolkerSiegel, that's a classic code injection vulnerability, it's on the level of passing unsanitized data to `eval`, as here, you're passing arbitrary file names as **code** to `sh` (think of a `$(reboot).txt` lurking in the directory for instance). You'll find it discussed in many Q&As here. See for instance [Do I need to encapsulate awk variables in quotes in order to sanitize them?](http://unix.stackexchange.com/a/113799) – Stéphane Chazelas Aug 19 '16 at 06:30
  • @StéphaneChazelas Ah, makes sense - I was just thinking about something directl related to `xargs`, because `xargs` option `-i` without argument defaults to `-I{}` ( the `{}` will not be part of the shell code) – Volker Siegel Aug 19 '16 at 12:41