4

I have a script which reads an VCS log, converts that to latex, and then uses awk to replace keyword @COMMITS@ in a template with the text:

untagged=$(get-commit-messages "$server" "$rev")
IFS=$'\n' untagged=( $untagged )  # Tokenize based on newlines
for commit in "${untagged[@]}"; do
  tex+="\\\nui{"                  # Wrap each commit in a custom command
  tex+=$(echo "$commit" | pandoc -t latex --wrap=none)
  tex+="}
"
done

awk -v r="$tex" '{gsub(/@COMMITS@/,r)}1' template

Since commit messages are really just text, I use pandoc -t latex to ensure everything is escaped properly for the latex parser.

My problem is that the awk parser un-escapes these. If I find a _ in a commit message, pandoc will replace that with \_, but then awk will convert it back and give a warning:

awk: warning: escape sequence `\_' treated as plain `_'

That will cause the latex parser to fail.

Is there a way for me to prevent awk from un-escaping stuff? If not I'll look for a non-awk solution for text-replacement.

Stewart
  • 12,628
  • 1
  • 37
  • 80
  • 2
    If `r` can contain a backreference like `&` then `gsub(/@COMMITS@/,r)` will fail. If I were you I'd use literal string operations with `index()` and `substr()` instead. See https://stackoverflow.com/a/67852969/1745001 for more information on that if you're interested and then ask a new question if you can't figure out how to make it work for you. – Ed Morton Jun 07 '21 at 14:45
  • 1
    In `IFS=$'\n' untagged=( $untagged )`, you're using the split+glob operator to split `$untagged` but forgot to disable the glob part (also note that it strips empty lines). In `bash`, see also `readarray -t untagged <<< "$untagged"` to get one array element per line. – Stéphane Chazelas Jun 07 '21 at 15:02
  • 1
    See also [external variable in awk](//unix.stackexchange.com/a/56141) (or [Use a shell variable in awk](//unix.stackexchange.com/a/56190)) and [Why is printf better than echo?](//unix.stackexchange.com/q/65803) – Stéphane Chazelas Jun 07 '21 at 15:04
  • I don't think your `pandoc` line does what you want. See for instance `echo '%foo' | pandoc -t latex --wrap=none` – Stéphane Chazelas Jun 07 '21 at 15:07
  • 1
    By the way - you won't need as many escapes inside your shell strings if you use the correct quotes. Always use single quotes unless you NEED double quotes and then use double unless you NEED none. If you follow that rule then `tex+="\\\nui{"` becomes `tex+='\\nui{'`. See https://mywiki.wooledge.org/Quotes for more info. – Ed Morton Jun 07 '21 at 15:07
  • Another note on `pandoc`: Is this OK? https://termbin.com/3w8u => out https://termbin.com/3fc8 (limited test, but shows some) – ibuprofen Jun 07 '21 at 15:59
  • Not sure if it is needed, or it helps you, but https://www.mail-archive.com/[email protected]/msg07971.html is a routine used in Template-Toolkit (http://template-toolkit.org) - or perhaps they have updated it, but none the less. – ibuprofen Jun 07 '21 at 21:38
  • @ibuprofen Good summary on things that `pandoc` does. Those outputs actually look really good (better than I expected). Fortunately, I control the first few characters are integers so I don't expect a `%` in the first character sent to pandoc. – Stewart Jun 08 '21 at 05:48

1 Answers1

8

You're asking awk to interpret escape sequences when setting the variable by using -v so don't do that - use ENVIRON[] or ARGV[] instead to set the awk variable to a literal string:

$ shellvar='foo\tbar'

$ awk -v awkvar="$shellvar" 'BEGIN{print awkvar}'
foo     bar

$ shellvar="$shellvar" awk 'BEGIN{awkvar=ENVIRON["shellvar"]; print awkvar}'
foo\tbar

$ awk 'BEGIN{awkvar=ARGV[1]; delete ARGV[1]; print awkvar}' "$shellvar"
foo\tbar

See how-do-i-use-shell-variables-in-an-awk-script for more information.

Ed Morton
  • 28,789
  • 5
  • 20
  • 47