Concatenate multiple fields separately based on one (key) column

Question

I have a tab-separated test data like this:

    A     1     2
    B     3     4
    B     5     6
    C     7     8

I want to concatenate (aggregate) both column#2 and #3 based on duplicate values in column 1, like this:

    A     1     2
    B     3;5   4;6
    C     7     8

Right now the working code I developed based on solutions here (Merging multiple lines based on column 1) is not merging values of multiple columns separately:

   <test.txt awk -F"\t" '{OFS="\t"} {a[$1]=($1 in a)?a[$1]";"$2FS$3:$1FS$2FS$3} END{for(i in a){print a[i]}}'

    A     1     2
    B     3     4;5 6
    C     7     8

Would really appreciate if someone can help to point out the error in my code. I tried several variations but could not format it properly.

score 4 · Accepted Answer · answered Jun 19 '15 at 23:12

4

Use separate variables to contain each column's concatenated values.

awk -F'\t' -v OFS='\t' '{
    if ($1 in a) {
        a[$1] = a[$1]";"$2;
        b[$1] = b[$1]";"$3;
    } else {
        a[$1] = $2;
        b[$1] = $3;
    }
}
END { for (i in a) print i, a[i], b[i] }' < test.txt

answered Jun 19 '15 at 23:12

Barmar

9,648
1
19
28

This works great with my exiftool script. – Sridhar Sarnobat Jun 18 '20 at 21:18

score 0 · Answer 2 · answered Jun 19 '15 at 23:25

Bit late, but here's an alternative using a synthesised sub array:

awk -v OFS='\t' '{
  if ($1 in arr) {
    split(arr[$1], subArr);
    arr[$1] = subArr[1] ";" $2 OFS subArr[2] ";" $3;
  }
  else {
    arr[$1] = $2 OFS $3
  } }
  END {
    for (i in arr) print i,arr[i];
  }'

Concatenate multiple fields separately based on one (key) column

2 Answers2