Identify lines that are out of order

Question

I have a process that generates output mostly in lexicographically sorted order according to a (timestamp) field, but occasionally the lines will be output in the wrong order:

2014-08-14 15:42:02.019220203 ok
2014-08-14 15:42:03.523164367 ok
2014-08-14 15:42:04.525655832 ok
2014-08-14 15:42:06.523324269 ok
2014-08-14 15:42:05.930966407 oops
2014-08-14 15:42:07.643347946 ok
2014-08-14 15:42:07.567283110 oops

How can I identify each location where the data are "unsorted"?

Expected output (or similar):

2014-08-14 15:42:05.930966407 oops
2014-08-14 15:42:07.567283110 oops

I need a solution that works as the data are generated (e.g. in a pipeline); it's less useful if it only operates on complete files. sort --check would be ideal but it only outputs the first point of disorder; I need a full listing.

score 4 · Accepted Answer · answered Aug 14 '14 at 15:11

4

awk 'NR>1 && $0"" < last; {last=$0}'

Prints the lines that sort before the preceding line. The $0"" is to force lexical comparison (on the output of seq 10 it would spot 10 as sorting before 9).

answered Aug 14 '14 at 15:11

Stéphane Chazelas

522,931
91
1,010
1,501

score 1 · Answer 2 · answered Aug 14 '14 at 15:16

1

I think that shell string comparisons should respect lexicographical order (according to the current locale, of course) - so perhaps you could do something like

#!/bin/bash

lastline=""
while IFS= read -r line; do 
  [[ "$line" < "$last" ]] && printf '%s\n' "$line"
  last="$line"
done < <(your process)

answered Aug 14 '14 at 15:16

steeldriver

78,509
12
109
152

2

On the output of `seq 1000000`, I find it's about 50 times as slow as the `gawk` equivalent (12x for `ksh93`, 20x for `zsh` and `mksh`, and `mawk` is twice as fast as `gawk`). IMO at least, using `while read` loops to process text is bad practice. At least here, you didn't fall into the usual pitfalls though. – Stéphane Chazelas Aug 14 '14 at 15:22

Identify lines that are out of order

2 Answers2