Concatenate matching ids from two files, with other columns from both files

Question

I want to match ids from file A with file B, to save it in third file with columns that belongs to both files. I have tried almost all awks that I found, but somehow doesnt work properly. I would appreciate you help!

fileA:

id;name
1;"sam"
4;"jon"

fileB:

id;surname
5;"smith"
1;"khon"

file3:

id;name;surname
1;"sam";"khon"

What have you tried? Can you sort your files and use `join` instead of `awk`? — icarus, May 01 '20 at 15:20
Does this answer your question? [Join two files with matching columns](https://unix.stackexchange.com/questions/43417/join-two-files-with-matching-columns) — icarus, May 01 '20 at 15:24

score 0 · Accepted Answer · answered May 01 '20 at 16:48

Normally, this would be a job for join. However, join requires sorted input and sort will put numbers before letters in all locales I am aware of (please correct me if that's not true). So join can only get you part of the way there:

$ join -t';' <(sort fileA) <(sort fileB)
1;"sam";"khon"
id;name;surname

You could get around this by joining the the first line of each file first and then the rest:

$ join -t';' <(head -n1 fileA) <(head -n1 fileB)
id;name;surname
$ join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort)
1;"sam";"khon"

Then, grouping these in a subshell and redirecting the output gives you your desired output:

 $ ( join -t';' <(head -n1 fileA) <(head -n1 fileB); join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort))
id;name;surname
1;"sam";"khon"

So you could do this:

( 
    join -t';' <(head -n1 fileA) <(head -n1 fileB); 
    join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort)
) > file3

Alternatively, you can indeed do it in awk:

$ awk -F';' '(NR==FNR){data[$1]=$1";"$2; next} 
             ($1 in data){print data[$1]";"$2}' fileA fileB 
id;name;surname
1;"sam";"khon"

Thank you so, so much!!!! This with awk works great. with join i have to sort the other columns, will take a lot on that later. Thanks again! — Rosa Andrew, May 01 '20 at 17:21

score 0 · Answer 2 · answered May 01 '20 at 17:11

0

Using Miller (https://github.com/johnkerl/miller) and running

mlr --csv --fs ";" join -j id -f fileA.csv fileB.csv >file3.csv

you have

+----+------+---------+
| id | name | surname |
+----+------+---------+
| 1  | sam  | khon    |
+----+------+---------+

Some notes:

--fs to set field separator;
-j to set join field.

answered May 01 '20 at 17:11

aborruso

2,618
10
26

Does it actually add all those extra `-`, `|` and `+`? Is there an option to avoid that? – terdon May 01 '20 at 17:23
This also works! Thank you very much! – Rosa Andrew May 01 '20 at 17:24

Concatenate matching ids from two files, with other columns from both files

2 Answers2