using ripgrep to find adjacent word

Question

How to use ripgrep to find adjacent duplicated words. for example

one hello hello world

How to locate hello hello by using ripgrep?

Solved

rg  '(hello)[[:blank:]]+\1' --pcre2  <<<'one hello hello world'

what do you mean by adjacent? two words next to each other which are same? do you have to use ripgrep? — golder3, Feb 02 '22 at 16:42

αғsнιη · Accepted Answer · 2022-02-02T17:49:41.467

4

You can use GNU grep too (for the Back-reference extension):

grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'

for the portability you could use:

grep '\(hello\)[[:blank:]][[:blank:]]*\1'

add -w if you want to match on word boundaries instead;

From the man grep:

Back-references and Subexpressions
The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.

edited Feb 02 '22 at 17:49

answered Feb 02 '22 at 17:28

αғsнιη

40,939
15
71
114

What's the \1 for ? I see it prevents matching single hello but how does it work, can you give more details ? – golder3 Feb 02 '22 at 17:39
Back-references are not standard in extended regular expressions (believe it or not), although GNU `grep` may support them (`-E` is not standard either, after all). – Kusalananda Feb 02 '22 at 20:53

golder3 · Answer 2 · 2022-02-02T17:32:11.043

3

Here's the solution with awk:

{
    for (i=1; i <= NF; i++) {
        if ($i == $(i+1)) {
            printf("%s %s\n", $i,$(i+1));
            i++;
        }
    }
}

This will only search for pairs of 2 same words - for example: word word word -> word word (one pair) word word word word -> word word word word (two pairs)

If you want to count the number of adjacent same words in each line:

{
    for (i=1; i <= NF; i++) {
        counter = 1;
        while ($i == $(i+1)) {
            counter++;
            i++;
        }
        if (counter > 1) {
            printf("%d %s %d\n", NR,$i,counter);
        }
    }
}

Usage:

awk -f awk_script your_file

edited Feb 02 '22 at 17:32

answered Feb 02 '22 at 17:19

golder3

1,014
4
15

yes, but then it would match two pairs in a sequence of 3 words – golder3 Feb 02 '22 at 17:59
1

No, because that way you only check if odd words are followed by the same word. It wouldn't match blah word word blah :) – golder3 Feb 02 '22 at 18:05

using ripgrep to find adjacent word

Solved

2 Answers2