Replace (one or) two different patterns in a file with regexp

Question

Suppose an input.txt file that contains several strings as the following ones:

[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]

that I'd like to replace by:

:foo:`a`
:foo:`b`
:foo:`c`

I guess I could manage to achieve this result with sed or rg (I never used awk).

But this file also contains other strings as the following ones:

[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

that I'd like to replace by:

:foo:`d <a>`
:foo:`e <b> `
:foo:`f <c>`

All my attempts failed because I don't see how to handle two different patterns at once.

Do you know some ways to achieve the latter result (and, BTW, the former)?

Why do you feel that you need to handle both at once? Why not handle them separately? That would make it easier to understand and to maintain. — Kusalananda, Dec 03 '20 at 15:39
Indeed, I badly explained myself. I wanted to express I didn't know how to handle two patterns. The answers below gave me the hint (`$1`, `$2`, etc. with Perl, `\1`, `\2`, with sed). — Denis Bitouzé, Dec 03 '20 at 15:59

score 3 · Answer 1 · answered Dec 03 '20 at 16:50

3

With standard sed syntax:

sed '
  s/^\[\[\(.*\)>\(.*\)|\2\]\]$/:\1:`\2`/; t
  s/^\[\[\(.*\)>\(.*\)|\(.*\)\]\]$/:\1:`\3 <\2>`/'

answered Dec 03 '20 at 16:50

Stéphane Chazelas

522,931
91
1,010
1,501

Sundeep · Accepted Answer · 2020-12-03T15:32:46.857

With lookarounds, you can check if the strings around | are same or not. For example:

$ cat ip.txt 
[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

# same as: rg -NP '\[\[([^>]+)>([^|]+)\|(?!\2])([^|]+)]]' -r ':$1:`$3 <$2>`'
$ perl -pe 's/\[\[([^>]+)>([^|]+)\|(?!\2])([^|]+)]]/:$1:`$3 <$2>`/' ip.txt 
:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`

(?!\2]) is a negative lookahead assertion to ensure that the strings around | are different.

To implement both, you can make use of Perl code in replacement section with e flag.

$ cat ip.txt
[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]

[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

$ perl -pe 's/\[\[([^>]+)>([^|]+)\|([^|]+)]]/":$1:`$3" . ($2 eq $3 ? "`" : " <$2>`")/e' ip.txt 
:foo:`a`
:foo:`b`
:foo:`c`

:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`

Here, ($2 eq $3 ? "`" : " <$2>`") will choose the string depending on whether the strings around | are same or not.

score 1 · Answer 3 · answered Dec 03 '20 at 15:47

It seems that we could split this task to 2-3 separated parts. First we squeeze (-s) and replace some characters with tr, to create the "outline" of the output, and then with a sed we make two separate replacements, one for the two characters matching and one for when they are different.

< file tr -s '[<>|]' ':::``' | sed -E 's/(.)`\1`/`\1`/; s/([^:])`(.)`/`\2 <\1>`/'

Testing:

$ cat file
[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]
[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

$ <file tr -s '[<>|]' ':::``' | sed -E 's/(.)`\1`/`\1`/;s/([^:])`(.)`/`\2 <\1>`/' 
:foo:`a`
:foo:`b`
:foo:`c`
:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`

score 1 · Answer 4 · answered Dec 03 '20 at 16:30

Using awk to manage to handle both your given formats:

awk -F'\\[\\[|\\]\\]|>|\\|' '{
    print $1, $2, "`" ($3==$4? $3 : $4" <"$3">") "`";
}' OFS=':' infile

Test input:

[[foo>a|a]]
[[foo>bb|bb]]
[[foo>c|ccc]]
[[foo>aaaa|d]]
[[foo>b|ddd]]
[[foo>cccc|fff]]

Output:

:foo:`a`
:foo:`bb`
:foo:`ccc <c>`
:foo:`d <aaaa>`
:foo:`ddd <b>`
:foo:`fff <cccc>`

Replace (one or) two different patterns in a file with regexp

4 Answers4