0

I am trying to search a directory using pcregrep. I want to search using a long, multi-line string. Basically, I am trying to look through multiple code bases for plagiarism. So I want to be able copy/paste a code block from some code, and then search a directory for any exact matches.

The problem I'm having is that when I use pcregrep with the -M option (pcregrep -M), it appears to treat each line break as a separate pattern.

So, when I take a code block that I know is unique to one file, I may still get multiple responses because some individual lines may be used elsewhere.

Here is what I am using: pcregrep -FlMr "long, multi-line string" /directory/to/search/

What can I do to make sure that it will only return exact matches?

Z0OM
  • 1
  • 4
  • 24
  • 56
Lee Morgan
  • 101
  • 1
  • 1
    It's the `-F` option that causes this behavior I think (*"Interpret each data-matching pattern as a list of fixed strings, separated by newlines, instead of as a regular expression."*). Whether you can safely omit it will depend on whether `long, multi-line string` may contain regex metacharacters. – steeldriver Jun 01 '23 at 17:21
  • I see now. I think I misunderstood that option, that is why I used it. It will definitely contain regex metacharacters. I'll look into that now. I can see there is an option for setting the newline with -F. Thanks a lot. I'll see how that works. – Lee Morgan Jun 01 '23 at 17:42
  • 1
    Try `pcregrep -M "\Q$multiline_string\E"` (assuming the multiline string doesn't contain `\E` or use `perl` in slurp mode) – Stéphane Chazelas Jun 01 '23 at 20:51
  • Related: [How to know if a text file is a subset of another](https://unix.stackexchange.com/q/114877) – Stéphane Chazelas Jun 01 '23 at 20:54

0 Answers0