1

How to use ripgrep to find adjacent duplicated words. for example

one hello hello world

How to locate hello hello by using ripgrep?

Solved

rg  '(hello)[[:blank:]]+\1' --pcre2  <<<'one hello hello world'
Kusalananda
  • 320,670
  • 36
  • 633
  • 936
jian
  • 539
  • 3
  • 15

2 Answers2

4

You can use GNU grep too (for the Back-reference extension):

grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'

for the portability you could use:

grep '\(hello\)[[:blank:]][[:blank:]]*\1'

add -w if you want to match on word boundaries instead;


From the man grep:

Back-references and Subexpressions
The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
  • What's the \1 for ? I see it prevents matching single hello but how does it work, can you give more details ? – golder3 Feb 02 '22 at 17:39
  • Back-references are not standard in extended regular expressions (believe it or not), although GNU `grep` may support them (`-E` is not standard either, after all). – Kusalananda Feb 02 '22 at 20:53
3

Here's the solution with awk:

{
    for (i=1; i <= NF; i++) {
        if ($i == $(i+1)) {
            printf("%s %s\n", $i,$(i+1));
            i++;
        }
    }
}

This will only search for pairs of 2 same words - for example: word word word -> word word (one pair) word word word word -> word word word word (two pairs)

If you want to count the number of adjacent same words in each line:

{
    for (i=1; i <= NF; i++) {
        counter = 1;
        while ($i == $(i+1)) {
            counter++;
            i++;
        }
        if (counter > 1) {
            printf("%d %s %d\n", NR,$i,counter);
        }
    }
}

Usage:

awk -f awk_script your_file
golder3
  • 1,014
  • 4
  • 15