join multiple lines based on column1

Question

I have a file like below..

abc, 12345
def, text and nos    
ghi, something else   
jkl, words and numbers

abc, 56345   
def, text and nos   
ghi, something else 
jkl, words and numbers

abc, 15475  
def, text and nos 
ghi, something else
jkl, words and numbers

abc, 123345
def, text and nos
ghi, something else  
jkl, words and numbers

I want to convert (join) it as:

abc, 12345, 56345, 15475, 123345
def, text and nos, text and nos,text and nos,text and nos
ghi, something else, something else, something else, something else   
jkl, words and numbers, words and numbers, words and numbers, words and numbers

Do you actually have the extra blank lines in your input file? If not, please [edit] and remove them, you should show the file _exactly_ as it is. — terdon, Apr 11 '14 at 14:23

score 11 · Answer 1 · edited Jun 20 '17 at 08:37

11

If you don't mind the order of output:

$ awk -F',' 'NF>1{a[$1] = a[$1]","$2};END{for(i in a)print i""a[i]}' file 
jkl, words and numbers, words and numbers, words and numbers, words and numbers
abc, 12345, 56345, 15475, 123345
ghi, something else, something else, something else, something else
def, text and nos, text and nos, text and nos, text and nos

Explanation

NF>1 meaning we only need to process for line which is not blank.
We save all first field in the associative array a, with the key is the first field, the value is second field (or the rest of the line). If the key has already haved value, we concat two values.
In END block, we loop through the associative array a, print all its keys with corresponding value.

Or using perl will keep the order:

$perl -F',' -anle 'next if /^$/;$h{$F[0]} = $h{$F[0]}.", ".$F[1];
    END{print $_,$h{$_},"\n" for sort keys %h}' file
abc, 12345, 56345, 15475, 123345

def, text and nos, text and nos, text and nos, text and nos

ghi, something else, something else, something else, something else

jkl, words and numbers, words and numbers, words and numbers, words and numbers

edited Jun 20 '17 at 08:37

αғsнιη

40,939
15
71
114

answered Apr 11 '14 at 04:01

cuonglm

150,973
38
327
406

your perl solution from my question http://unix.stackexchange.com/questions/124181/merge-2-rows-based-on-the-same-column-values should also work right? – Ramesh Apr 11 '14 at 04:08
No. The OP want to concat string based on column 1, regardless of duplicated or not. Your question doesn't want duplicated. – cuonglm Apr 11 '14 at 04:16
oh ok. At the first glance, it seemed like almost similar to my question. :) – Ramesh Apr 11 '14 at 04:19
1

Neat, +1! That doesn't _keep_ the order though, it only recreates it in this particular example where the fields are in alphabetical order. – terdon Apr 11 '14 at 14:29
Just for laughs, I'd written almost exactly the same approach before reading your answer: `perl -F, -lane 'next unless /./;push @{$k{$F[0]}}, ",@F[1..$#F]"; END{print "$_@{$k{$_}}" foreach keys(%k)}' file` :) Great minds think alike! – terdon Apr 11 '14 at 14:43
@terdon: lol, at first I also think to use push instead of concat string :). Yeap, Great minds think alike! – cuonglm Apr 11 '14 at 16:52
@terdon you don't need `..$#F` as there are no commmas in the second column. It's not clear if OP would want to keep them or not if they were. You'll also need to add `\n` to your print string, except for the final line, to match the desired output. – Apr 12 '14 at 05:06
@Gnouc please explain your awk command little bit. – Avinash Raj Apr 12 '14 at 05:07
@AvinashRaj: Updated my answer! – cuonglm Apr 12 '14 at 05:18
@cuonglm Great answer, but could you please explain the `""` in the `END` block? – Sparhawk Oct 25 '18 at 03:42
@Sparhawk it keep the key and its value as is, like `k, v`, using `print i, a[i]` will give you `k , v` – cuonglm Oct 25 '18 at 14:40
@cuonglm Ah okay, I was comparing to `print i a[i]` instead, which has the same output as `print i""a[i]`. – Sparhawk Oct 26 '18 at 03:02

score 2 · Answer 2 · 2014-04-12T08:25:15.503

Oh, that's an easy one. Here's a simple version that keeps the order of the keys as they appear in the file:

$ awk -F, '
    /.+/{
        if (!($1 in Val)) { Key[++i] = $1; }
        Val[$1] = Val[$1] "," $2; 
    }
    END{
        for (j = 1; j <= i; j++) {
            printf("%s %s\n%s", Key[j], Val[Key[j]], (j == i) ? "" : "\n");       
        }                                    
    }' file.txt

Output should look like this:

abc, 12345, 56345, 15475, 123345

def, text and nos, text and nos, text and nos, text and nos

ghi, something else, something else, something else, something else

jkl, words and numbers, words and numbers, words and numbers, words and numbers

If you don't mind having an extra blank line at the end, just replace the printf line with printf("%s %s\n\n", Key[j], Val[Key[j]]);

join multiple lines based on column1

2 Answers2

Linked