How to use ripgrep to find adjacent duplicated words. for example
one hello hello world
How to locate hello hello by using ripgrep?
Solved
rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world'
How to use ripgrep to find adjacent duplicated words. for example
one hello hello world
How to locate hello hello by using ripgrep?
rg '(hello)[[:blank:]]+\1' --pcre2 <<<'one hello hello world'
You can use GNU grep too (for the Back-reference extension):
grep -E '(hello)[[:blank:]]+\1' <<<'one hello hello world'
for the portability you could use:
grep '\(hello\)[[:blank:]][[:blank:]]*\1'
add -w if you want to match on word boundaries instead;
From the man grep:
Back-references and Subexpressions
The back-reference \n, where n is a single digit, matches the substring previously matched by the nth parenthesized subexpression of the regular expression.
Here's the solution with awk:
{
for (i=1; i <= NF; i++) {
if ($i == $(i+1)) {
printf("%s %s\n", $i,$(i+1));
i++;
}
}
}
This will only search for pairs of 2 same words - for example: word word word -> word word (one pair) word word word word -> word word word word (two pairs)
If you want to count the number of adjacent same words in each line:
{
for (i=1; i <= NF; i++) {
counter = 1;
while ($i == $(i+1)) {
counter++;
i++;
}
if (counter > 1) {
printf("%d %s %d\n", NR,$i,counter);
}
}
}
Usage:
awk -f awk_script your_file