1

This will be easier to explain via example. Here are my input files:

file1:
x
x
a
b
c
x
x

file2:
x
x
c
b
a
x
x

file3:
x
x
x x a b c x x
x
x

file4:
x
x
x x c b a x x
x
x

file5:
x
x
a b
x
x

file6:
x
x
x x b c x x
x
x

file7:
x
x
x x b b x x
x
x

I want to search for files that have regexp a AND c. This will return files 1-4.

I want the output to look as close as possible to ag -C <number> --pager="less -R" regexp's output. Here's what that would look like (I am surrounding results with angle brackets to represent color highlighting):

file1:
2: x
3: <a>
4: b
5: <c>
6: x

file2:
2: x
3: <c>
4: b
5: <a>
6: x

file3:
2: x
3: x x <a> b <c> x x
4: x

file4:
2: x
3: x x <c> b <a> x x
4: x

Or maybe ag would print it like this:

file1:
2: x
3: <a>
4: b
--
4: b
5: <c>
6: x

I'm not sure, but this detail doesn't matter to me. Here's what does:

  1. The highlighting
  2. The relative path of the file above the results
  3. The features less provides for navigation and search
  4. It can find multiple regexps ANDed together
  5. That -C option still exists on the command line

The line numbers are a nice to have, but not necessary.


I've tried many many things, and this is the closest I've gotten:

Step 1: Precompile a file list of each individual regexp

for x in $array_of_regexp_file_names; 
  do r=${x/.txt/} ; # remove .txt from the end
  ag -il $r | sort > $x & ; # sort the list of FILES with this single regexp
done

This gives a list of files for each regexp, sorted by filename.

Step 2: Use ag to search the intersection of 2+ file lists

I will break down the following:

ag -C 1 --pager="less -R" "regexp1|regexp2" $(comm -12 regexp1.txt regexp2.txt)

$(comm -12 regexp1.txt regexp2.txt)

This command finds the intersection of two file lists. That is, the red in this picture:

enter image description here

ag -C 1 --pager="less -R" "regexp1|regexp2" ...

Here, I am giving ag a regexp that I know exists in every file in that intersection. That may seem redundant, but I'm doing it because want those words highlighted in the output. It makes my life 1000x easier.

Here's the problem: that intersection is so many files, running the command gives me this output:

zsh: argument list too long: ag

Other than that, my workaround works. I've tested this by running a command like this:

ag -C 1 --pager="less -R" "regexp1|regexp2" $(comm -12 patter1.txt regexp2.txt | head -10)

The problem is the intersecting list is so long, it doesn't fit on the command line. If ag provided an option to pass a list of files to search, I could get past this, but it doesn't have that functionality.

Regardless, I'm hoping I don't need to: I'm assuming there's a much easier solution to this problem, I just don't know what it is.


Edit: To solidify some highlighting rules, here are some other examples:

Example 1

regexs:

regex1 = a.
regex2 = .b

input

file8:
x
x
abc
x
x

Output:

2: x
3: <ab>c
4: x

Example 2

regexs:

regex1 = foo
regex2 = oba

input

file9:
x
x
foobar
x
x

Output:

2: x
3: <foo>bar
4: x

I picked these outputs because that's what grep and ag already do, but I'm pretty ambivalent about both of these scenarios, so if these examples are challenging to implement, I don't mind if highlighting works differently in these edge cases; in general, my regexps won't overlap.

Daniel Kaplan
  • 757
  • 10
  • 25
  • Don't use the word "patttern" for matching text as it's ambiguous, see [how-do-i-find-the-text-that-matches-a-pattern](https://stackoverflow.com/questions/65621325/how-do-i-find-the-text-that-matches-a-pattern). Please [edit] your question to replace the word `pattern` everywhere it's used with `string` or `regexp`, whichever you actually want to match with. – Ed Morton Mar 10 '22 at 14:19
  • I'm not familiar with `ag` so idk how it behaves in the non-trivial cases - please [edit] your example to include such cases, e.g. where the "patterns" match a) the same strings (e.g. matching `a.` and `.b` against `ab` with `echo 'ab' | grep -Eo 'a.|.b'` would output `ab`) and b) overlapping strings (e.g. matching `foo` and `oba` against `foobar` with `echo 'foobar' | grep -Eo 'foo|oba'` would output `foo` ) so we can see what the highlighting should look like and whether or not the same input string can match multiple "patterns", what counts as all "patterns" matching, etc. – Ed Morton Mar 10 '22 at 14:31
  • @EdMorton re: your first comment, it defaults to regexp, but an option can change them to strings. re: your second question, I honestly don't have a preference. If you want something concrete, I guess however `ag` currently works? I'm assuming they've thought of more edge cases than I could ever come up with, so it could be used as a specification. But I'd be fine with undefined behavior given these scenarios or assuming it can't happen (because it would error if it did). – Daniel Kaplan Mar 10 '22 at 17:02
  • So are you saying you need a solution that has options to match using either regexps or strings? If not, pick 1 for your question. Telling someone who's told you they're not familiar with `ag` to do it "however ag works" isn't very useful. I gave you a couple of examples of non-trivial cases, please just think about what you'd like the output to be for those cases at least and include that in your question rather than asking us to figure out what the output should be and then you possibly being unhappy with the result and so us having wasted our time. – Ed Morton Mar 10 '22 at 17:42
  • If you leave it up to us to decide a) if the matching should be string or regexp and b) what various inputs might look like and c) what the output should look like for various inputs then the resulting answers will be an inconsistent mess and probably a waste of time - just decide exactly what the question is about in terms of input and output and update it to reflect that with sample input/output that tests those requirements. – Ed Morton Mar 10 '22 at 17:44
  • @EdMorton Sorry for not mentioning it in my comment, but I already edited the original post to replace all "pattern" with "regexp". re: highlighting, I didn't say "do it exactly how `ag` does it," it was more like, "*shrug*, whatever's the easiest implementation for the person answering"; if you want a requirement of sorts, I guess the highlighting could look like `` and `r`, respectively. I have a feeling I'm missing something, as I don't understand why you are putting the `-o` option in your second comment's examples. *to be continued* – Daniel Kaplan Mar 11 '22 at 02:50
  • @EdMorton I updated the original post. Let me know if I missed anything, thanks – Daniel Kaplan Mar 11 '22 at 03:22
  • The `-o` is so we can see which parts of the regexp `grep` matches since you want to display the file contents only if all parts match. So if ag behaves like grep, and grep only matches some of the regexp parts when multiple are provided, and you want this new tool to behave the same way - is that considered all parts matching or not? – Ed Morton Mar 11 '22 at 13:59
  • I understand you don't care about a lot of the details but then that means WE have to decide what those details should be so we can write code to implement those decisions and so YMMV with who's willing to take on that extra work in order to help you. By contrast if you decided what you want and wrote clear, simple requirements and sample input/output to test your requirements then it becomes much less effort for people to help you. I personally have to move on as I've spent far too much time on this question already but good luck! – Ed Morton Mar 11 '22 at 14:48
  • FWIW I think you're asking for too much in this 1 question. You're asking to a) find files, b) match and highlight strings in those files, and c) print lines before/after the matched lines in those files. IMHO you should pick 1 thing and just ask about that, e.g. "b" - how to match and highlight strings in a file. Once you get an answer for that, if you can''t figure out how to print lines around it then ask about that. I assume you won't really have trouble finding files. – Ed Morton Mar 11 '22 at 15:32

0 Answers0