15

I am trying to do some searching and replacing on a variable using the ${VAR//search/replace} parameter expansion. I have a pretty long and evil PS1, that I want to work out the size of after expansion. To do so I have to remove a bunch of escape sequences I stuff into it. However on trying to remove all the ANSI CSI SGR sequences I've fell across an issue with my syntax.

Given my PS1 of:

PS1=\[\033]0;[\h] \w\007\]\[\033[1m\]\[\033[37m\](\[\033[m\]\[\033[35m\]\u@\[\033[m
\]\[\033[32m\]\h\[\033[1m\]\[\033[37m\]\[\033[1m\])\[\033[m\]-\[\033[1m\](\[\033[m
\]\t\[\033[37m\]\[\033[1m\])\[\033[m\]-\[\033[1m\](\[\033[m\]\[\033[36m\]\w\[\033[1m
\]\[\033[37m\])\[\033[35m\]${git_branch}\[\033[m\]\n$

(yes it's sick I know...)

I'm trying to do:

# readability
search='\\\[\\033\[[0-9]*m\\\]'
# do the magic
sane="${PS1//$search/}"

However these seems to be greedy at the point of [0-9] (almost like [0-9] is treated like a . instead):

echo "${PS1//$search/}"
\[\033]0;[\h] \w\007\]\n$ 

If I remove the *, and change [0-9] to [0-9][0-9] (as that is more illustrative) I get closer to the expected result:

$ search='\\\[\\033\[[0-9][0-9]m\\\]'
$ echo "${PS1//$search/}"
\[\033]0;[\h] \w\007\]\[\033[1m\](\[\033[m\]\u@\[\033[m\]\h\[\033[1m
\]\[\033[1m\])\[\033[m\]-\[\033[1m\](\[\033[m\]\t\[\033[1m\])\[\033[m\]-\[\033[1m
\](\[\033[m\]\w\[\033[1m\])$(git_branch)\[\033[m\]\n$ 

Why is the * (zero or more) doing crazy things? am I missing something here? If I pass the same regex through sed I get the expected result:

echo $PS1 | sed "s/$search//g"
\[\033]0;[\h] \w\007\](\u@\h)-(\t)-(\w)$(git_branch)\n$
Drav Sloan
  • 14,145
  • 4
  • 45
  • 43
  • 5
    It's not regex, it's just pattern matching similar to a file glob. `extglob` does affect the pattern matching behavior. – jordanm Aug 28 '13 at 02:33
  • Bum, that'll be why - I had a hunch it may have been the case :/ I was trying to find clarification of the matching mechanism, without much success. *heads off to read about extglob* (looks like a job for sed!) – Drav Sloan Aug 28 '13 at 02:38
  • 1
    `*([0-9])` is the equivalent of `[0-9]*` using `extglob`. – jordanm Aug 28 '13 at 02:44
  • Yeah I figured that out after I read about them in the bash man page, updated the question to reflect that, cheers duder! :D – Drav Sloan Aug 28 '13 at 02:46
  • 2
    If you got the correct answer, it's acceptable to answer your own question. I was happy to have provided some guidance. – jordanm Aug 28 '13 at 02:50
  • 2
    @DravSloan - this prompt IS sick! 8-) – slm Aug 28 '13 at 02:51
  • My PS1 lives in it's own `.bash_ps1` run command file (which gets sourced from `.bashrc` on interactive shells), and does evil things with a function called by the $PROMPT_COMMAND variable, so it actually sick and evil :P I'm actually changing that function, when this question arose... – Drav Sloan Aug 28 '13 at 03:01

3 Answers3

9

Sounds to me you want to remove things between \[ and \]:

$ shopt -s extglob
$ printf '%s\n' "${PS1//\\\[*(\\[^]]|[^\\])\\\]/}"
(\u@\h)-(\t)-(\w)${git_branch}\n$

However, bash substitution is so inefficient that you would probably be better off firing perl or sed here, or do it in a loop like:

p=$PS1 np=
while :; do
  case $p in
    (*\\\[*\\\]*) np=$np${p%%\\\[*};p=${p#*\\\]};;
    (*) break;;
  esac
done
np=$np$p
printf '%s\n' "$np"

(that's standard POSIX sh syntax above, BTW).

And if you want the expanded prompt from that:

ep=$(PS4=$np;exec 2>&1;set -x;:); ep=${ep%:}
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • 5
    Ah my day is complete, another bunch of symbols of greater elder magic from the High Priest of Command Line, Stephane. I swear for half of your posts my eyes are set to the wrong baud rate, and I get a screen of mess :) And yes, the end aim was to remove all escape sequences: it didn't strike me to just remove between `[` and `]`. thanks! – Drav Sloan Aug 28 '13 at 09:18
8

After some guidance from jordanm (and reading of the "Pattern Matching" section of the bash man page), it turns out that these patterns used by parameter expansion are not regex. However for my specific case, if shopt extglob is on, I can do:

search='\\\[\\033\[*([0-9])m\\\]'

where *([0-9]) is the same as [0-9]* in regex.

It seems that extglob provides some mechanisms similar to regex with (from bash man page):

          ?(pattern-list)
                 Matches zero or one occurrence of the given patterns
          *(pattern-list)
                 Matches zero or more occurrences of the given patterns
          +(pattern-list)
                 Matches one or more occurrences of the given patterns
          @(pattern-list)
                 Matches one of the given patterns
          !(pattern-list)
                 Matches anything except one of the given patterns
Drav Sloan
  • 14,145
  • 4
  • 45
  • 43
  • 2
    Yes, `extglob` implements a subset of `ksh` extended globs. `ksh93` actually has a printf operator to convert between patterns and (AT&T) REs (`printf '%P\n' '\\\[[0-9]*\\\]'` gives `*\\\[*([0-9])\\\]*`) – Stéphane Chazelas Aug 28 '13 at 10:03
  • Hmm, it seems that *[0-9] works in other regex queries (without round brackets). – macieksk Jan 22 '19 at 22:27
  • @macieksk `*[0-9]` in shell wildcard patterns means any sequence of 0 or more characters (`*`) followed by a character (or collating element in some shells) in the 0 to 9 range. So it match `123` but also `foo9` (and possibly `foo` for instance as there are often much more than 10 characters in the 0 to 9 range). – Stéphane Chazelas Jun 21 '21 at 10:10
2

Pure Bash full range of ANSI sequences supported

# Strips ANSI CSI (ECMA-48, ISO 6429) codes from text
# Param:
# 1: The text
# Return:
# &1: The ANSI stripped text
strip_ansi() {
  shopt -s extglob
  printf %s "${1//$'\e'[@A-Z\[\\\]\^_]*([0-9:;<=>?])*([ \!\"#$%&\'()\^*+,\-.\/])[@A-Z\[\\\]\^_\`a-z\{|\}~]/}"
}
Léa Gris
  • 397
  • 4
  • 6