1

I have a tab-separated test data like this:

    A     1     2
    B     3     4
    B     5     6
    C     7     8

I want to concatenate (aggregate) both column#2 and #3 based on duplicate values in column 1, like this:

    A     1     2
    B     3;5   4;6
    C     7     8

Right now the working code I developed based on solutions here (Merging multiple lines based on column 1) is not merging values of multiple columns separately:

   <test.txt awk -F"\t" '{OFS="\t"} {a[$1]=($1 in a)?a[$1]";"$2FS$3:$1FS$2FS$3} END{for(i in a){print a[i]}}'
    A     1     2
    B     3     4;5 6
    C     7     8

Would really appreciate if someone can help to point out the error in my code. I tried several variations but could not format it properly.

psaima
  • 13
  • 3

2 Answers2

4

Use separate variables to contain each column's concatenated values.

awk -F'\t' -v OFS='\t' '{
    if ($1 in a) {
        a[$1] = a[$1]";"$2;
        b[$1] = b[$1]";"$3;
    } else {
        a[$1] = $2;
        b[$1] = $3;
    }
}
END { for (i in a) print i, a[i], b[i] }' < test.txt
Barmar
  • 9,648
  • 1
  • 19
  • 28
0

Bit late, but here's an alternative using a synthesised sub array:

awk -v OFS='\t' '{
  if ($1 in arr) {
    split(arr[$1], subArr);
    arr[$1] = subArr[1] ";" $2 OFS subArr[2] ";" $3;
  }
  else {
    arr[$1] = $2 OFS $3
  } }
  END {
    for (i in arr) print i,arr[i];
  }'
Graeme
  • 33,607
  • 8
  • 85
  • 110