How to grep/awk/sed for text in a log and display the chunk that has the text?

Question

I'm looking for something pretty similar to this.

The logs look like this:

[09:44:22] [main] ERROR [url/location] - A ONE LINE ERROR
[09:44:22] [main] ERROR [url/location] - Another ERROR 
[09:44:22] [main] SOMETHING DIFFERENT
[09:44:22] [main] SOMETHING DIFFERENT AGAIN
[09:44:22] [main] WARN [url/location] - ANOTHER ONE LINE WARN

Line after line with no empty lines between them, though occasionally there are indents when further info is available for a specific piece.

I want to be able to pull every line that includes ERROR (ideally as a script that can pull ERROR And/Or FAIL, WARN, etc.) and display them according to a parameter. It'll make sifting through logs for fails and whatnot much easier.

You'll need to define what a "chunk" is - beyond the fact that it *isn't* delimited by blank lines — steeldriver, Jul 14 '16 at 15:39
I couldn't find a way to make it so that what's shown in the block above is line after line instead of line, space, line. Each message has its own line with a timestamp at the beginning, if it's possible to sort by that. — Patremagne, Jul 14 '16 at 15:44
You should be able to preserve text formating by using code markdown (basically select it then press Ctrl-K) - see [How do I format my posts using Markdown or HTML?](http://unix.stackexchange.com/help/formatting) in the [Help Center](http://unix.stackexchange.com/help) — steeldriver, Jul 14 '16 at 16:04
I've got something working with [awk -v FS='' '/ERROR/' file.txt] that seems to work, though only in my test file where I copied a few lines from the log into a new file. Is there a size limit where the command stops working? Minus the brackets, as I'm new to SE and don't know much of the forum syntax. — Patremagne, Jul 14 '16 at 16:11
I'm not aware of any size limit. However I still don't understand what your input and desired output are. — steeldriver, Jul 14 '16 at 16:32
I don't know how to make it more clear than I have already. I want to have every line in a log (starting with a timestamp and ending at the next timestamp) that has ERROR in it to be printed. — Patremagne, Jul 14 '16 at 17:06
@Patremagne I don't see why the `awk` command you quoted wouldn't work if all you want is the matching lines. Setting `FS` is unnecessary, but shouldn't matter. Just printing lines containing a pattern is simple with any of the tools you mention, and perhaps the simplicity of it is what causes confusion. (And the word "chunk" and mentioning indentations). `awk '/ERROR/' in.txt`, `grep ERROR in.txt` and `sed -n '/ERROR/p' in.txt` should all print the lines containing "ERROR" anywhere on the line, though grep is made just for this. — ilkkachu, Jul 14 '16 at 21:10

Timothy Martin · Answer 1 · 2016-07-14T17:38:32.640

2

GNU grep is able to do this quite simply. From man grep:

Two regular expressions may be joined by the infix operator |; the resulting regular expression matches any string matching either subexpression.

grep "ERROR\|FAIL\|WARN" /path/to/example.log

egrep eliminates the need for escaping the | symbols.

egrep "ERROR|FAIL|WARN" /path/to/example.log

edited Jul 14 '16 at 17:38

answered Jul 14 '16 at 17:20

Timothy Martin

8,447
1
34
40

Why is `|` called "infix operator", when it means "OR" here? Also, your answer will not output following lines, if they start with whitespace. – Alex Stragies Jul 14 '16 at 17:24
That is a good question for which I do not know the answer. I believe the pertinent information is contained in the second half of that sentence: `the resulting regular expression matches any string matching either subexpression`. As for a `leading whitespace`, it works on any line containing `ERROR, FAIL, or WARN` on my system. I have edited the answer to include `GNU`. – Timothy Martin Jul 14 '16 at 17:38
Ok, thanks for looking that up. I often use the word "infix", in the context "Reverse incremental infix search" (CTRL-R), but i had never seen it written in the context you quoted. 2) I meant lines, that start with whitespace, but follow a line containing the keyword, while not themselves containing the keyword. – Alex Stragies Jul 14 '16 at 17:44
This is the correct answer. +1. (I had misunderstood OP's desired input format.) – Alex Stragies Jul 14 '16 at 18:25
@AlexStragies I've sort of found the issue that has me saying some things aren't working. The actual log file that I want to pull errors from won't work with any of the commands in this thread, but the same exact text (copied/pasted word for word from the original) pasted into a fresh text file works. I have both read/write permissions, so I'm not sure why the original log or even duplicates of it won't obey. – Patremagne Jul 14 '16 at 22:24

murphy · Answer 2 · 2016-07-14T16:43:37.683

1

I suppose your log file looks like this?

example.log:

[09:44:22] [main] ERROR [url/location] - A ONE LINE ERROR
[09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR 
    with whitepace indention
[09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR 
       with tab indention
[09:44:22] [main] SOMETHING DIFFERENT
[09:44:22] [main] SOMETHING DIFFERENT
       with tab indention
[09:44:22] [main] WARN [url/location] - ANOTHER ONE LINE WARN

Admittedly not a one-liner and in perl, but it should do the job:

logsifter.pl:

#!/usr/bin/perl
use warnings;
use strict;

my $buffer="";

while(my $line= <>){
  chomp $line;
  if($line=~/ERROR|INFO|WARN/){
    print "$buffer\n" if $buffer;
    $buffer = $line;
  }
  elsif($line=~/^\s+(.*)$/){
    $buffer .= $1 if $buffer;
  }
  else{
    if($buffer){
      print "$buffer\n";
      $buffer ="";
    }
  }
}

print "$buffer\n";

call it like:

perl logsifter.pl < example.log
 [09:44:22] [main] ERROR [url/location] - A ONE LINE ERROR
 [09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR with whitepace indention
 [09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR with tab indention
 [09:44:22] [main] WARN [url/location] - ANOTHER ONE LINE WARN

edited Jul 14 '16 at 16:43

answered Jul 14 '16 at 16:36

murphy

345
1
11

I tried this and the output was just every line in the text file, regardless of whether or not it included ERROR. It's entirely likely I did something wrong, as I mentioned I'm new to all this. – Patremagne Jul 14 '16 at 17:14
@Patremagne : Murphy and I, we both basically use the same algorithm. I haven't run his version, but on first glance it looks correct . You may have missed, that his version filters for ERROR|INFO|WARN, and trailing lines – Alex Stragies Jul 14 '16 at 17:18
@AlexStragies the only small difference is, that I remove the line breaks in case of an indent. this way everything is in one line and easier to process with grep if necessary – murphy Jul 14 '16 at 17:21
@AlexStragies Ahh I missed that INFO was in the perl script, so I removed it and it worked on my test text file that only includes 10 or so lines from the actual log. Unfortunately when I try it on the log itself, there's no output. – Patremagne Jul 14 '16 at 17:26
@Patremagne I tested my solution with my provided example.log. It puts out the data as shown above. – murphy Jul 14 '16 at 17:31
@murphy Yes, it works for me as well with my test log of 10 lines, but the log I actually want to probe through has 2500 lines and has no output when I run it through your solution. – Patremagne Jul 14 '16 at 17:38
@Patremagne if you upload your file here https://gist.github.com/ and post the url, I will look into it. – murphy Jul 14 '16 at 17:39
@murphy I'd rather not post it publicly, but I can send it to you on some other platform. – Patremagne Jul 14 '16 at 20:30

Alex Stragies · Answer 3 · 2016-07-14T18:24:25.323

Now, that your Data format has been established, the answer becomes a lot simpler: grep was built for this.

Use as grep '<PATTERN>' <dataFile>

Where <PATTERN> is SearchWORD1 or SearchW1\|SearchW2

The answer below was written, when me and @murphy still had wrong assumptions about the dataformat:

Here is a one-line awk program that only searches for ERROR:

awk '/ERROR/{a=1;print} /^ / || /^\t/ {if (a) print;next} !/ERROR/ {a=0}'

You could make this into a flexible shell-function with parameter:

searchlog(){ awk -f <( echo "
/$1/{a=1;print}
/^ /||/^\t/{if (a) print;next}
! /$1/{a=0}
"); }

Run it either as LogData_generated_by_program | searchlog <PATTERN>, or searchlog <PATTERN> < File_containing_Log_Data.

For the example data format the other answerer "guessed", this results in:

$ searchlog ERROR < /tmp/exampleData
[09:44:22] [main] ERROR [url/location] - A ONE LINE ERROR
[09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR 
    with whitepace indention
[09:44:22] [main] ERROR [url/location] - A MULTI LINE ERROR 
       with tab indention

Sorry for my Linux ignorance, but what do you mean by "push your data into STDIN" ? — Patremagne, Jul 14 '16 at 17:08

How to grep/awk/sed for text in a log and display the chunk that has the text?

3 Answers3

Linked