0

I have tried various solutions given in this site for this kind of problem and none of these are helping me out.

I have two files (space-separated, contains columns). Both the files contain the same number of records as given in my examples below.

Let me call the columns as A, B, C ... (These are the column headers.)

File 1:

A  B  C  D
-----------
a1 b1 c1 d1
a2 b2 c2 d2
a3 b3 c3 d3
a4 b4 c4 d4
a5 b5 c5 d5

File 2:

E B  A  F
---------
1 b5 a5 f
2 b2 a2 f
1 a1 b1 f
1 a3 b3 f
2 a4 b4 f 

I want to produce the merged file:

A  B  C  D  E
-------------
a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1
Coder
  • 169
  • 1
  • 8
  • Just to clarify, these are the _headers_ of columns that you are showing. There is actual data beneath each column? – Kusalananda Nov 01 '18 at 19:15
  • @Kusalananda I have clarified it in my edit. These are column headers. I am not providing the data. I can if you want. – Coder Nov 01 '18 at 19:15
  • @JeffSchaller Yes, Jeff. I did it now. Sorry for the inconveniences. – Coder Nov 01 '18 at 19:30
  • does the data in file2's matching columns always exist in file1, or would we have to fill in something like "n/a" for missing pairs/triplets? – Jeff Schaller Nov 01 '18 at 19:35
  • @JeffSchaller Yes. both the files contain same records. So just need to have an extra 'E' column from File2 to file1. – Coder Nov 01 '18 at 19:37
  • OK, this may overlap with (or duplicate) Jeff’s comment, but: are you guaranteeing that every (`A`,`B`) combination that appears in `file1` (1) appears *exactly once* in ``file1``, ***and*** (2) appears exactly once in ``file2``?  Is the same true for the `A` and `B` values individually, or is it possible that line 42 might contain `a4` and `b2`? – Scott - Слава Україні Nov 01 '18 at 19:40
  • @Scott. (1) Yes. (2) Yes. Exactly once. No, line 42, will not contain a4, b2. – Coder Nov 01 '18 at 19:42
  • @JeffSchaller Yes, the column F is not required. We may include if we want, but, not necessary. – Coder Nov 01 '18 at 19:49
  • Do A and B column values may need to be permuted (per line) to get at the right matches? – RudiC Nov 01 '18 at 20:23

2 Answers2

2

Does this come close to what you need (some formatting still to be done...)?

awk 'NR == FNR {T[$2,$3] = T[$3,$2] = $1; next} {print $0, T[$1,$2]}' file2 file1
A  B  C  D E
----------- 
a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1
RudiC
  • 8,889
  • 2
  • 10
  • 22
0

a tedious join solution: first join on file1 col 1 and file2 col2, then join on file1 col 1 and file2 col 3:

{ 
    join -11 -22 -o 0,1.2,1.3,1.4,2.1 <(sed '1,2d' file1 | sort -k1,1) <(sed '1,2d' file2 | sort -k2,2) 
    join -11 -23 -o 0,1.2,1.3,1.4,2.1 <(sed '1,2d' file1 | sort -k1,1) <(sed '1,2d' file2 | sort -k3,3) 
} | sort -k 1,1 

outputs

a1 b1 c1 d1 1
a2 b2 c2 d2 2
a3 b3 c3 d3 1
a4 b4 c4 d4 2
a5 b5 c5 d5 1
glenn jackman
  • 84,176
  • 15
  • 116
  • 168