Filter out duplicate lines of tail -f

Question

I often need to tail -f apache access logs for websites to troubleshoot issues- one thing that makes it annoying is that anyone loading a page once may cause 12+ lines to get written to the log, and since they're long lines each one wraps multiple lines in my terminal.

tail -f seems to play nicely with piping to grep and awk, and I came up with a pretty simple solution to filter out duplicates when one IP address makes many requests in a particular second (as well as trim it to the particular info I usually need)-

tail -f log.file | awk ' { print $1 " " $4 " " $9}' | uniq

The problem is, this doesn't work. I just get no output at all, even when I know there should be tons of lines printed.

I've tried some troubleshooting, but haven't been able to get things to really work-

tail -f log.file | awk ' { print $1 " " $4 " " $9}'

This works exactly as I think it should, and prints the lines as they happen (but with many duplicates) like so:

12.34.56.78 [10/May/2016:18:42:01 200
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304
12.34.56.78 [10/May/2016:18:42:02 304

tail log.file | awk ' { print $1 " " $4  " " $9}' | uniq

This also works exactly as I think it should, and filters out any duplicate lines. But for my troubleshooting I really need the real time updates of tail -f

How can I make tail -f filter out duplicate lines?

This doesn't work - just no output. Had already tried it before making this post. Edit- turns out the stdbuf -oL needs to go before the awk, not the uniq — Yex, May 11 '16 at 00:14
tail -f log.file | stdbuf -oL awk ' { print $1 " " $4 " " $9}' | uniq This works exactly as I want things to. The filtering isn't perfect (sometimes you'll get alternating pairs of duplicates, but no double duplicates), but it's good enough. — Yex, May 11 '16 at 00:21
This question is not duplicate of [Turn of buffering in pipe](http://unix.stackexchange.com/q/25372/14267) although stdbuf can be used here. — Manwe, May 26 '16 at 06:28

John1024 · Accepted Answer · 2016-05-11T00:25:18.507

2

As a pure awk solution, try:

tail -f log.file | awk ' $0!=last{ print $1 " " $4 " " $9} {last=$0}'

This one prints a new output line only if the input line is different from the previous input line.

As a slight variation, this one prints a new output line only if this output line differs from the previous output line:

tail -f log.file | awk '{$0=$1" "$4" "$9} last!=$0{print} {last=$0}'

Example

Let's try this test file:

$ cat logfile
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 11
1 2 3 4 5 6 7 8 19
1 2 3 4 5 6 7 8 19 12
1 2 3 4 5 6 7 8 19 13
1 2 3 4 5 6 7 8 19
1 2 3 4 5 6 7 8 29

awk filters out the duplicate output lines:

$ cat logfile | awk '{$0=$1" "$4" "$9} last!=$0{print} {last=$0}' 
1 4 9
1 4 19
1 4 29

edited May 11 '16 at 00:25

answered May 11 '16 at 00:05

John1024

73,527
11
167
163

This doesn't seem to work, just tested it and it spit out 12 duplicate lines. I expect it's because the parts it's not printing aren't exact duplicates. – Yex May 11 '16 at 00:10
@Yex In that case, try the second version. It should ignore the non-printed part of the input lines. – John1024 May 11 '16 at 00:20

Filter out duplicate lines of tail -f

1 Answers1

Example