1

Suppose an input.txt file that contains several strings as the following ones:

[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]

that I'd like to replace by:

:foo:`a`
:foo:`b`
:foo:`c`

I guess I could manage to achieve this result with sed or rg (I never used awk).

But this file also contains other strings as the following ones:

[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

that I'd like to replace by:

:foo:`d <a>`
:foo:`e <b> `
:foo:`f <c>`

All my attempts failed because I don't see how to handle two different patterns at once.

Do you know some ways to achieve the latter result (and, BTW, the former)?

αғsнιη
  • 40,939
  • 15
  • 71
  • 114
Denis Bitouzé
  • 299
  • 1
  • 9
  • 3
    Why do you feel that you need to handle both at once? Why not handle them separately? That would make it easier to understand and to maintain. – Kusalananda Dec 03 '20 at 15:39
  • Indeed, I badly explained myself. I wanted to express I didn't know how to handle two patterns. The answers below gave me the hint (`$1`, `$2`, etc. with Perl, `\1`, `\2`, with sed). – Denis Bitouzé Dec 03 '20 at 15:59

4 Answers4

3

With standard sed syntax:

sed '
  s/^\[\[\(.*\)>\(.*\)|\2\]\]$/:\1:`\2`/; t
  s/^\[\[\(.*\)>\(.*\)|\(.*\)\]\]$/:\1:`\3 <\2>`/'
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
2

With lookarounds, you can check if the strings around | are same or not. For example:

$ cat ip.txt 
[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

# same as: rg -NP '\[\[([^>]+)>([^|]+)\|(?!\2])([^|]+)]]' -r ':$1:`$3 <$2>`'
$ perl -pe 's/\[\[([^>]+)>([^|]+)\|(?!\2])([^|]+)]]/:$1:`$3 <$2>`/' ip.txt 
:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`

(?!\2]) is a negative lookahead assertion to ensure that the strings around | are different.


To implement both, you can make use of Perl code in replacement section with e flag.

$ cat ip.txt
[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]

[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

$ perl -pe 's/\[\[([^>]+)>([^|]+)\|([^|]+)]]/":$1:`$3" . ($2 eq $3 ? "`" : " <$2>`")/e' ip.txt 
:foo:`a`
:foo:`b`
:foo:`c`

:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`

Here, ($2 eq $3 ? "`" : " <$2>`") will choose the string depending on whether the strings around | are same or not.

Sundeep
  • 11,753
  • 2
  • 26
  • 57
1

It seems that we could split this task to 2-3 separated parts. First we squeeze (-s) and replace some characters with tr, to create the "outline" of the output, and then with a sed we make two separate replacements, one for the two characters matching and one for when they are different.

< file tr -s '[<>|]' ':::``' | sed -E 's/(.)`\1`/`\1`/; s/([^:])`(.)`/`\2 <\1>`/'

Testing:

$ cat file
[[foo>a|a]]
[[foo>b|b]]
[[foo>c|c]]
[[foo>a|d]]
[[foo>b|e]]
[[foo>c|f]]

$ <file tr -s '[<>|]' ':::``' | sed -E 's/(.)`\1`/`\1`/;s/([^:])`(.)`/`\2 <\1>`/' 
:foo:`a`
:foo:`b`
:foo:`c`
:foo:`d <a>`
:foo:`e <b>`
:foo:`f <c>`
thanasisp
  • 7,802
  • 2
  • 26
  • 39
1

Using awk to manage to handle both your given formats:

awk -F'\\[\\[|\\]\\]|>|\\|' '{
    print $1, $2, "`" ($3==$4? $3 : $4" <"$3">") "`";
}' OFS=':' infile

Test input:

[[foo>a|a]]
[[foo>bb|bb]]
[[foo>c|ccc]]
[[foo>aaaa|d]]
[[foo>b|ddd]]
[[foo>cccc|fff]]

Output:

:foo:`a`
:foo:`bb`
:foo:`ccc <c>`
:foo:`d <aaaa>`
:foo:`ddd <b>`
:foo:`fff <cccc>`
αғsнιη
  • 40,939
  • 15
  • 71
  • 114