9

I have two files having a primary key value as a first field, and a corresponding value(s) as the remaining fields, and some primary key values are missing in one of them but are present in another, and vice versa:

$ cat jointest1.txt jointest2.txt  
a 1  
b 2  
d 4  
e 5    

a 10  
b 11  
c 12  
d 13  

I'd expect an output that merges those files according to a primary key, either substituting the missing values or not, like:

$ joinmerge  jointest1.txt jointest2.txt   
a 1 10  
b 2 11  
c - 12  
d 4 13  
e 5 -

Ability to replace missing values with dashes or something is optional.

I tried join, but it says my files are not properly sorted:

$ join jointest1.txt jointest2.txt   
a 1 10  
b 2 11  
join: file 2 is not in sorted order  
d 4 13  

What command should I use instead?

don_crissti
  • 79,330
  • 30
  • 216
  • 245
mbaitoff
  • 4,911
  • 8
  • 28
  • 32

3 Answers3

12

Try the following:

> join -e- -a1 -a2 jointest1 -o 0 1.1 1.2 2.1 2.2 jointest2
a a 1 a 10
b b 2 b 11
c - - c 12
d d 4 d 13
e e 5 - -

or

> join -e- -a1 -a2 jointest1 -o 0 1.2 2.2 jointest2
a 1 10
b 2 11
c - 12
d 4 13
e 5 -

I'm not sure if/how it is possible to achieve the same without the -o option. The -o option says: first print the join field, then field no. 2 from file 1, then field 2 from file 2. Kind of sad you have to know the format of the files to get the empty fields to work.

ghostwheel
  • 121
  • 2
8

What implementation of join are you using? With join (GNU coreutils) 5.97, I can use

[0 1021] ~/temp/jointest % join -a1 -a2 jointest1.txt jointest2.txt
a 1 10
b 2 11
c 12
d 4 13
e 5 

and the "plain" join works, too (but omits c and e). There is an -e option which supposedly lets you choose the marker for empty fields, but it appears to be broken in my version and only fills case e, not case c.

Ulrich Schwarz
  • 15,669
  • 4
  • 47
  • 58
  • My bad, just missed the manual for option `-a`. Option `-e` is broken even worse for me, neither filling nor "e" nor "c" records. My version is `join (GNU coreutils) 7.4`. – mbaitoff Mar 21 '12 at 09:31
  • @mbaitoff: I can imagine the commit log: "Fixed _inconsistent behaviour_ for -e option"... – Ulrich Schwarz Mar 21 '12 at 10:09
  • By the way, where to report that "bug" if it is a bug indeed? – mbaitoff Mar 21 '12 at 10:30
  • @mbaitoff: the manpage says: Report bugs to ; but don't expect a rapid response, I guess. – Ulrich Schwarz Mar 21 '12 at 12:26
  • `-e` is not broken, it's just not documented well that you need to use `-o auto` for `-e` to work. See https://unix.stackexchange.com/questions/34655/join-filling-in-missing-key-values/37057#37057 – wisbucky Jul 10 '20 at 07:03
1

I wrote a perl tool for exactly that key,value problem:

Pairing up the right rows: any number of files. It's also available via GitHub.

To execute it, you type:

merge -k -e "-" jointest1.txt jointest2.txt
slm
  • 363,520
  • 117
  • 767
  • 871
guest
  • 11
  • 1