4

I have two files with tab-separated values that look like this:

file1:

A    1
B    3
C    1
D    4

file2:

E    1
B    3
C    2
A    9

I would like to find rows between files 1 and 2 where the string in column 1 is the same, then get the corresponding values. The desired output is a single file that looks like this:

B    3    3
C    1    2
A    1    9

Can this be done with a Unix one-liner?

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
turtle
  • 2,607
  • 5
  • 19
  • 16

3 Answers3

11

GNU coreutils includes the command join that does exactly what you want if line sorting in the result is irrelevant:

join <(sort file1) <(sort file2)

A 1 9
B 3 3
C 1 2

If you want the tabs back, do:

join <(sort file1) <(sort file2) | tr ' ' '\t'

A   1   9
B   3   3
C   1   2

Or use the t option to join.

(<() aka process substitution, requires ksh93 (where the feature originated in), bash or zsh)

jaume
  • 1,258
  • 13
  • 9
  • Now the output is sorted alphabetically, instead of according to file2, but it was not clear whether that would be a problem. – Bernhard Oct 25 '12 at 10:52
2
ire@localhost: sort -k1 file2 | join file1 -
A 1 9
B 3 3
C 1 2

If you want tabs in the output, then do

sort -k1 file2 | join -t "        " file1 -

where you can construct the tab inside the quotes by pressing CTRL-V <tab>.

ire_and_curses
  • 12,232
  • 3
  • 38
  • 33
1

If you don't want the output sorted but instead have the same order as file2, awk is a good tool to use:

awk '
    NR == FNR {val[$1]=$2; next} 
    $1 in val {print $1, val[$1], $2}
' file1 file2
glenn jackman
  • 84,176
  • 15
  • 116
  • 168