5

On our server a cronjob has logged a count of files in a shared directory. The log is of the form:

2003-07-03T16:05 279
2003-07-03T16:10 283
2003-07-03T16:15 282

By now this file has far over a million entries. I am interested in finding the biggest changes we ever had (negative and positive). I can write a program to find this, but is there some tool that can give me a list of deltas?

The original is on Solaris, but I have a copy of the file on my Linux Mint system.

user90256
  • 67
  • 6

3 Answers3

7

If you have the package num-utils installed, you can do:

cut -d ' ' -f 2 inputfile | numinterval | sort -u 

The first and the last number there give the min, resp. max changes.

If that list is too long and you also have moreutils installed you can do:

cut -d ' ' -f 2 inputfile | numinterval | sort -u | pee "tail -1" "head -1"

On Mint you should be able to install those packages, on Solaris you probably have to compile from source.

terdon
  • 234,489
  • 66
  • 447
  • 667
Anthon
  • 78,313
  • 42
  • 165
  • 222
4
$ awk 'BEGIN{last=0}{delta[NR]=$2-last; last=$2; print $0" "delta[NR]}' file

will give you

2003-07-03T16:05 279 279
2003-07-03T16:10 283 4
2003-07-03T16:15 282 -1

with deltas in last column, so to find the biggest just pipe it to sort

$ awk 'BEGIN{last=0}{delta[NR]=$2-last; last=$2; print $0" "delta[NR]}' file | sort -k3n
2003-07-03T16:15 282 -1
2003-07-03T16:10 283 4
2003-07-03T16:05 279 279

but for million entries this will be really slow. I would probably use mysql or other db instead.

jimmij
  • 46,064
  • 19
  • 123
  • 136
  • +1. You could also just calculate the greatest in the awk directly: `awk -v last=0 '{k=($2-last);if(NR==2){max=k}if(k > max){max=k} last=$2}END{print max}' file` – terdon Nov 04 '14 at 17:29
  • @terdon true, but I guess that OP wants not only a number but the whole line which corresponds to this number. One thing is for sure - I unnecessary used array instead of plain variable as in your case. – jimmij Nov 04 '14 at 17:41
1

Shows two lines with the biggest difference between

awk '{c=$2-a[2];
      if(c<0)c=-c;
      if(+a[2]&&c>b){b=c;d=a[1]" "a[2]"\n"$0};
      split($0,a," ")}
  END{print "Difference is",b,"between:\n"d}'
Costas
  • 14,806
  • 20
  • 36