I'm trying to match and merge two sets of sorted data, one set per file. Each file contains two columns: the key field and the associated value. The resulting output should contain three columns: the key field, the value (if any) from the first file, and the value (if any) from the second file. I need to include lines of data that are not matched.
First file "john"
apple,green
cherry,red
orange,orange
Second file "jane"
apple,red
banana,yellow
cherry,yellow
kiwi,green
Desired result
apple,green,red
banana,,yellow
cherry,red,yellow
kiwi,,green
orange,orange,
I thought initially that this was a trivial job for join
LC_ALL=C join -j1 -a1 -a2 -t',' john jane
But the result of the -a1 -a2 puts the unmatched value always in the second column:
apple,green,red
banana,yellow
cherry,red,yellow
kiwi,green
orange,orange
I need to be able to see from which source file the unmatched value originates, ideally by having those values in the appropriate second or third column of the result file, but I cannot work out a simple way of achieving this without descending into awk ... getline() type constructs.
Any suggestions, please?