Compare two files for matching lines and store positive results

Question

I have two files.

File 1:

A0001  C001
B0003  C896
A0024  C234
.
B1542  C231
.
upto 28412 such lines

File 2:

A0001
A0024
B1542
.
.
and 12000 such lines.

I want to compare File 2 against File 1 and store the matching lines from File 1. I tried Perl and Bash but none seems to be working.

The latest thing I tried was something like this:

for (@q) # after storing contents of second file in an array
{
        $line =`cat File1 | grep $_`; #directly calling File 1 from bash
        print $line;
}

but it fails.

score 27 · Answer 1 · answered Apr 17 '14 at 03:07

27

This should do the job:

grep -Ff File2 File1

The -f File2 reads the patterns from File2 and the -F treats the patterns as fixed strings (ie no regexes used).

answered Apr 17 '14 at 03:07

Graeme

33,607
8
85
110

Only downside with grep is that the awk solution is much faster for ~200 KB files. – Marcus Dec 28 '18 at 20:33

score 10 · Accepted Answer · answered Apr 17 '14 at 03:11

10

You can use awk:

$ awk 'FNR==NR{a[$1];next}($1 in a){print}' file2 file1
A0001   C001
A0024   C234
B1542   C231

answered Apr 17 '14 at 03:11

cuonglm

150,973
38
327
406

@user3543389 - why did you pick this one over the grep solution? I'm just curious. – slm Apr 17 '14 at 03:17
I have already tried that. see this http://stackoverflow.com/questions/23122636/match-first-columns-in-two-files – user3543389 Apr 17 '14 at 03:19
Although this one still doesnt give output with larger files – user3543389 Apr 17 '14 at 03:21
@user3543389: Do you mean "this one" is my answer? A note in your `perl` solution that you should use `while` loop instead of `for` loop. – cuonglm Apr 17 '14 at 03:30

score 3 · Answer 3 · answered Apr 17 '14 at 06:42

3

It looks to me like both files are already sorted on the first field. If so:

join file1 file2

is best, by about as far as your files are large.

answered Apr 17 '14 at 06:42

jthill

2,671
12
15

Tried this; each file must be sorted for this to work. In `grep` solution that is not needed. – Matthew Turner Sep 13 '18 at 22:20
2

@МатиТернер yes, this is true, keeping large files sorted saves much time (when they're big enough to pirate the R out of RAM 'cause L2's not even close to enough) and keeping small files sorted is too cheap to meter anyway. – jthill Sep 24 '18 at 21:07

Compare two files for matching lines and store positive results

File 1:

File 2:

3 Answers3

Linked