Remove adjacent duplicate lines while keeping the order

Question

I have a file with one column with names that repeat a number of times each. I want to condense each repeat into one, while keeping any other repeats of the same name that are not adjacent to other repeats of the same name.

E.g. I want to turn the left side to the right side:

Golgb1    Golgb1    
Golgb1    Akna
Golgb1    Spata20
Golgb1    Golgb1
Golgb1    Akna
Akna
Akna
Akna
Spata20
Spata20
Spata20
Golgb1
Golgb1
Golgb1
Akna
Akna
Akna

This is what I've been using: perl -ne 'print if ++$k{$_}==1' file.txt > file2.txt However, this method only keeps one representative from the left (i.e. Golb1 and Akna are not repeated).

Is there a way to keep unique names for each block, while keeping names that repeat in multiple, non-adjacent blocks?

DopeGhoti · Accepted Answer · 2018-04-23T15:57:37.433

25

uniq will do this for you:

$ uniq inputfile
Golgb1
Akna
Spata20
Golgb1
Akna

edited Apr 23 '18 at 15:57

answered Apr 23 '18 at 15:40

DopeGhoti

73,792
8
97
133

2

wow that was embarrassingly easy! thanks! – Age87 Apr 23 '18 at 15:45
@Age87 Unix is great! This only works because you expect duplicates to be adjacent, already (or, don't wish to remove non-adjacent ones). Normally, the recommendation is to use `sort | uniq` – jpaugh Apr 23 '18 at 21:42
1

Or more succinctly, `sort -u` (: – DopeGhoti Apr 23 '18 at 21:53

score 10 · Answer 2 · answered Apr 23 '18 at 15:39

10

Awk solution:

awk '$1 != name{ print }{ name = $1 }' file.txt

The output:

Golgb1
Akna
Spata20
Golgb1
Akna

answered Apr 23 '18 at 15:39

RomanPerekhrest

29,703
3
43
67

score 6 · Answer 3 · answered Apr 23 '18 at 15:37

6

Try this - save the previous line and compare against current line

$ perl -ne 'print if $p ne $_; $p=$_' ip.txt
Golgb1
Akna
Spata20
Golgb1
Akna

You've tagged uniq as well - did you try it?

$ uniq ip.txt
Golgb1
Akna
Spata20
Golgb1
Akna

answered Apr 23 '18 at 15:37

Sundeep

11,753
2
26
57

score 1 · Answer 4 · answered Apr 26 '18 at 12:46

With sed it can be done as follows:

sed -e '$!N;/^\(.*\)\n\1$/!P;D' input_file

Here we have in the pattern space at any time 2 lines. When the comparison between them fails we print the first one and chop it from the front and go back and append the next line into the pattern space. Rinse...repeat

Utilizing Perl in the slurp mode we treat the whole file as one long string on which the regex is applied which does the comparison for you.

perl -0777pe 's//$1/ while /^(.*\n)\1+/gm' input_file

score 0 · Answer 5 · answered Jun 27 '18 at 23:55

Question about Rakesh Sharma's sed solution.

What if you have a input file such as:

-126.1 48.206
-126.106 48.21
-126.11 48.212
-126.114 48.214
-126.116 48.216
-126.118 48.216
-126.128 48.222
-126.136 48.226

And you want an output file to be:

-126.1 48.206
-126.106 48.21
-126.11 48.212
-126.114 48.214
-126.116 48.216
-126.128 48.222
-126.136 48.226

Note the missing:

-126.118 48.216

I know the command I want is similar to your solution:

sed -e '$!N;/^\(.*\)\n\1$/!P;D' input_file

Cannot alter it in the right way to print both columns and only be sorted in this special way with column 2 values. Any tips?

`sed -e '$!N' -e '/.*\.$[0-9]*$\n.*\.\1$/!{P;D;}' -e 's/\n.*//;s/^/\n/;D'` will delete the subsequent repeating elements. Note: This requires `GNU sed`. For `POSIX` behavior, it needs slight alteration. — Rakesh Sharma, Jun 28 '18 at 08:02

Remove adjacent duplicate lines while keeping the order

5 Answers5