0

I want to know the sum of values in a field for every variable in another field. For example for the following input I want to know the sum of values in 3rd column for every value in 1st column:

a x 3
b y 4
a y 2
b x 5

The output should be:

a 5
b 9

My data is in tsv format. I might want something like this:

awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3 }END{print SUM}' 

but for every value of column 1. I found a related question here but as I am very new to awk scripting I am not able to modify the given awk script for my purpose.

I don't have datamash installed so I need a solution with awk and for loop.

Thank you

schrodingerscatcuriosity
  • 12,087
  • 3
  • 29
  • 57
Arsala
  • 67
  • 3

1 Answers1

2

Make an array indexed on $1.

awk -F'\t' '{ SUM[$1] += $3 } END { for (j in SUM) print j, SUM[j] }'
Paul_Pedant
  • 8,228
  • 2
  • 18
  • 26
  • If sequence matters, add ` | sort` to the awk command. – Paul_Pedant Feb 19 '20 at 16:30
  • I know the OP did but - don't use all upper case for user-defined variable names to avoid clashing with builtin variable names (and make it more obvious you're using user-defined instead of builtin variables). – Ed Morton Feb 19 '20 at 19:04
  • 1
    @edmorton. Agreed not good practice, but decided to show the parallel with the one-variable solution. – Paul_Pedant Feb 19 '20 at 22:43