Joining two space-separated files based on matching values in two different columns

Question

I have tried various solutions given in this site for this kind of problem and none of these are helping me out.

I have two files (space-separated, contains columns). Both the files contain the same number of records as given in my examples below.

Let me call the columns as A, B, C ... (These are the column headers.)

File 1:

A  B  C  D
-----------
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
a4 b4 c4 d4
a5 b5 c5 d5

File 2:

E B  A  F
---------
1 b5 a5 f
2 b2 a2 f
1 a1 b1 f
1 a3 b3 f
2 a4 b4 f

I want to produce the merged file:

A  B  C  D  E
-------------
a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1

Just to clarify, these are the _headers_ of columns that you are showing. There is actual data beneath each column? — Kusalananda, Nov 01 '18 at 19:15
@Kusalananda I have clarified it in my edit. These are column headers. I am not providing the data. I can if you want. — Coder, Nov 01 '18 at 19:15
@JeffSchaller Yes, Jeff. I did it now. Sorry for the inconveniences. — Coder, Nov 01 '18 at 19:30
does the data in file2's matching columns always exist in file1, or would we have to fill in something like "n/a" for missing pairs/triplets? — Jeff Schaller, Nov 01 '18 at 19:35
@JeffSchaller Yes. both the files contain same records. So just need to have an extra 'E' column from File2 to file1. — Coder, Nov 01 '18 at 19:37
OK, this may overlap with (or duplicate) Jeff’s comment, but: are you guaranteeing that every (`A`,`B`) combination that appears in `file1` (1) appears *exactly once* in ``file1``, ***and*** (2) appears exactly once in ``file2``? Is the same true for the `A` and `B` values individually, or is it possible that line 42 might contain `a4` and `b2`? — Scott - Слава Україні, Nov 01 '18 at 19:40
@Scott. (1) Yes. (2) Yes. Exactly once. No, line 42, will not contain a4, b2. — Coder, Nov 01 '18 at 19:42
@JeffSchaller Yes, the column F is not required. We may include if we want, but, not necessary. — Coder, Nov 01 '18 at 19:49
Do A and B column values may need to be permuted (per line) to get at the right matches? — RudiC, Nov 01 '18 at 20:23

score 2 · Accepted Answer · answered Nov 01 '18 at 20:25

Does this come close to what you need (some formatting still to be done...)?

awk 'NR == FNR {T[$2,$3] = T[$3,$2] = $1; next} {print $0, T[$1,$2]}' file2 file1
A  B  C  D E
----------- 
a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1

score 0 · Answer 2 · answered Nov 02 '18 at 01:44

a tedious join solution: first join on file1 col 1 and file2 col2, then join on file1 col 1 and file2 col 3:

{ 
    join -11 -22 -o 0,1.2,1.3,1.4,2.1 <(sed '1,2d' file1 | sort -k1,1) <(sed '1,2d' file2 | sort -k2,2) 
    join -11 -23 -o 0,1.2,1.3,1.4,2.1 <(sed '1,2d' file1 | sort -k1,1) <(sed '1,2d' file2 | sort -k3,3) 
} | sort -k 1,1

outputs

a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1

Joining two space-separated files based on matching values in two different columns

2 Answers2