0

I have a command like

rga --files-with-matches --count-matches --sort path -i -e "use cases?" -e "user stor(y|ies)" -e "Technical debt" -e "Code Quality" -e "software development" -e "Agile Manifesto"

The output is like

a1.pdf:18
a2.pdf:10
a3.pdf:14
....

Here, :NUM is number of matches.

I want all files who have more than 10 matches (without the colon and number, so that I can pipe the output to another command). Like:

a1.pdf
a3.pdf
....

I tried .. | cut -d':' -f2 but it only give the number. .. | cut -d':' -f1 only gives the file name.

What might be the solution here?

Freddy
  • 25,172
  • 1
  • 21
  • 60
Ahmad Ismail
  • 2,478
  • 1
  • 22
  • 47
  • Edit the question and add the expected output that you want piped to another command. It isn't clear what you are trying to do. You stated that you want `a1.pdf` and `a3.pdf` which is only the file name but then you state that the command that only gives the file name isn't what is wanted. – Nasir Riley Mar 03 '23 at 01:07
  • The code block after "I want all files who have more tha..." is the expected output. If the number after `:` is more than 10, print the file name (without colon). If you look closely, current output is three files with `:NUM`, the expected output is two files without `:NUM`. – Ahmad Ismail Mar 03 '23 at 01:15
  • According to your statement: `.. | cut -d':' -f1 only gives the file name.`, you have that unless there is a misunderstanding. Either way, it's better to edit the question and add what you are actually getting so that it's clear. – Nasir Riley Mar 03 '23 at 01:30

2 Answers2

2

With awk:

... | awk -F: '$NF>10{ sub(/:[0-9]+$/, ""); print }'

Split records on : and test if the last field is greater 10. If the condition is true, remove : followed by at least one digit at the end of the record. Print the (modified) record.

Freddy
  • 25,172
  • 1
  • 21
  • 60
2

This will work even if the filename does contain a : character. It uses two capture groups, the first captures everything up to (but not including) the last : character in a line (the filename) and the second capturing all digits after the last : (the count). Input lines not matching that pattern are ignored.

$ rga ... | perl -n -E 'm/^(.*):(\d+)$/; say $1 if $2 > 10'
a1.pdf
a3.pdf

It won't work on filenames containing newline characters. If you need that, and rga can be made to produce NUL-separated output (e.g. with a -0 or -z or similar option) you could use that in combination with perl's -0 option for reading NUL-separated input.

cas
  • 1
  • 7
  • 119
  • 185