48

I have a text file of this type, and I would look for any lines containing the string Validating Classification and then obtain uniquely the reported errors. I do not know the types of possible errors.

Input file:

201600415 10:40 Error Validating Classification: error1
201600415 10:41 Error Validating Classification: error1
201600415 10:42 Error Validating Classification: error2
201600415 10:43 Error Validating Classification: error3
201600415 10:44 Error Validating Classification: error3

Output file

201600415 10:40 Error Validating Classification: error1
201600415 10:42 Error Validating Classification: error2
201600415 10:43 Error Validating Classification: error3

Can I achieve this using grep, pipes and other commands?

user3065205
  • 581
  • 1
  • 4
  • 3
  • 3
    using `grep .... | sort --unique` – Ahmed Nabil Aug 03 '22 at 07:53
  • I vote for reopening this question. The one marked as duplicate is different because it it is not about grep. In case you are using git, the command `git grep -h | sort --unique` will give unique occurrences of grep matches. – Paul Rougieux Nov 29 '22 at 15:58

3 Answers3

89

You will need to discard the timestamps, but 'grep' and 'sort --unique' together can do it for you.

grep --only-matching 'Validating Classification.*' | sort --unique

So grep -o will only show the parts of the line that match your regex (which is why you need to include the .* to include everything after the "Validating Classification" match). Then once you have just the list of errors, you can use sort -u to get just the unique list of errors.

11

You can use this command assuming your data in in the file test

uniq -f 2 <test
Marko
  • 156
  • 6
2

I would go with awk

awk -F: '{ if (!a[$3]++ ) print ;}' file
  • -F: use : as separator
  • $3 is pattern after :
  • !a[$3]++ ensure being true only on first occurence
Archemar
  • 31,183
  • 18
  • 69
  • 104