Getting sum of values in a field based on variables in other field

Question

I want to know the sum of values in a field for every variable in another field. For example for the following input I want to know the sum of values in 3rd column for every value in 1st column:

a x 3
b y 4
a y 2
b x 5

The output should be:

a 5
b 9

My data is in tsv format. I might want something like this:

awk -F'\t' 'BEGIN{SUM=0}{ SUM+=$3 }END{print SUM}'

but for every value of column 1. I found a related question here but as I am very new to awk scripting I am not able to modify the given awk script for my purpose.

I don't have datamash installed so I need a solution with awk and for loop.

Thank you

score 2 · Accepted Answer · answered Feb 19 '20 at 16:27

2

Make an array indexed on $1.

awk -F'\t' '{ SUM[$1] += $3 } END { for (j in SUM) print j, SUM[j] }'

answered Feb 19 '20 at 16:27

Paul_Pedant

8,228
2
18
26

If sequence matters, add ` | sort` to the awk command. – Paul_Pedant Feb 19 '20 at 16:30
I know the OP did but - don't use all upper case for user-defined variable names to avoid clashing with builtin variable names (and make it more obvious you're using user-defined instead of builtin variables). – Ed Morton Feb 19 '20 at 19:04
1

@edmorton. Agreed not good practice, but decided to show the parallel with the one-variable solution. – Paul_Pedant Feb 19 '20 at 22:43

Getting sum of values in a field based on variables in other field

1 Answers1