26

I have 4 files which are like

       file A
       >TCONS_00000867
       >TCONS_00001442
       >TCONS_00001447
       >TCONS_00001528
       >TCONS_00001529
       >TCONS_00001668
       >TCONS_00001921

       file b
       >TCONS_00001528
       >TCONS_00001529
       >TCONS_00001668
       >TCONS_00001921
       >TCONS_00001922
       >TCONS_00001924

       file c
       >TCONS_00001529
       >TCONS_00001668
       >TCONS_00001921
       >TCONS_00001922
       >TCONS_00001924
       >TCONS_00001956
       >TCONS_00002048

       file d
       >TCONS_00001922
       >TCONS_00001924
       >TCONS_00001956
       >TCONS_00002048

All files contain more than 2000 lines and are sorted by first column.

I want to find common lines in all files. I tried awk and grep and comm but not working.

thanasisp
  • 7,802
  • 2
  • 26
  • 39
user106326
  • 451
  • 2
  • 7
  • 13

2 Answers2

35

Since the files are already sorted:

comm -12 a b |
  comm -12 - c |
  comm -12 - d

comm finds common lines between files. By default comm prints 3 TAB-separated columns:

  1. The lines unique to the first file,
  2. The lines unique to the second file,
  3. The lines common to both files.

With the -1, -2, -3 options, we suppress the corresponding column. So comm -12 a b reports the lines common to a and b. - can be used in place of a file name to mean stdin.

Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
7
cat a b c d |sort |uniq -c |sed -n -e 's/^ *4 \(.*\)/\1/p'
Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
Piotr
  • 71
  • 1
  • 1
  • Actually, save the `sed`, this is quite good for finding duplicate lines across many files: `cat` to `sort` to `uniq -c`. Somehow I didn't quite think of this, good answer! – smaslennikov May 21 '19 at 21:35
  • 1
    You can also use uniq command to only print duplicated lines: `uniq -cd` – mems Sep 30 '19 at 14:44