1

I want to match ids from file A with file B, to save it in third file with columns that belongs to both files. I have tried almost all awks that I found, but somehow doesnt work properly. I would appreciate you help!

fileA:

id;name
1;"sam"
4;"jon"

fileB:

id;surname
5;"smith"
1;"khon"

file3:

id;name;surname
1;"sam";"khon"
aborruso
  • 2,618
  • 10
  • 26
  • What have you tried? Can you sort your files and use `join` instead of `awk`? – icarus May 01 '20 at 15:20
  • https://unix.stackexchange.com/a/43420/194382 – icarus May 01 '20 at 15:23
  • 2
    Does this answer your question? [Join two files with matching columns](https://unix.stackexchange.com/questions/43417/join-two-files-with-matching-columns) – icarus May 01 '20 at 15:24

2 Answers2

0

Normally, this would be a job for join. However, join requires sorted input and sort will put numbers before letters in all locales I am aware of (please correct me if that's not true). So join can only get you part of the way there:

$ join -t';' <(sort fileA) <(sort fileB)
1;"sam";"khon"
id;name;surname

You could get around this by joining the the first line of each file first and then the rest:

$ join -t';' <(head -n1 fileA) <(head -n1 fileB)
id;name;surname
$ join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort)
1;"sam";"khon"

Then, grouping these in a subshell and redirecting the output gives you your desired output:

 $ ( join -t';' <(head -n1 fileA) <(head -n1 fileB); join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort))
id;name;surname
1;"sam";"khon"

So you could do this:

( 
    join -t';' <(head -n1 fileA) <(head -n1 fileB); 
    join -t';' <(tail -n+2 fileA | sort) <(tail -n+2 fileB | sort)
) > file3

Alternatively, you can indeed do it in awk:

$ awk -F';' '(NR==FNR){data[$1]=$1";"$2; next} 
             ($1 in data){print data[$1]";"$2}' fileA fileB 
id;name;surname
1;"sam";"khon"
terdon
  • 234,489
  • 66
  • 447
  • 667
  • Thank you so, so much!!!! This with awk works great. with join i have to sort the other columns, will take a lot on that later. Thanks again! – Rosa Andrew May 01 '20 at 17:21
0

Using Miller (https://github.com/johnkerl/miller) and running

mlr --csv --fs ";" join -j id -f fileA.csv fileB.csv >file3.csv

you have

+----+------+---------+
| id | name | surname |
+----+------+---------+
| 1  | sam  | khon    |
+----+------+---------+

Some notes:

  • --fs to set field separator;
  • -j to set join field.
aborruso
  • 2,618
  • 10
  • 26