2

I guess it's best to start with an example:

> echo "[20-20:10]Something" | sed -r -e 's/^\[[0-9\:\-]+(.*)$/\1/' 
]Something
> echo "[20-20:10]Something" | sed -r -e 's/^\[[0-9\-\:]+(.*)$/\1/' 
-20:10]Something

The only difference is that I swapped : and - characters in character class of regex. So: does the order of characters matter in sed's regex's character classes? I doesn't seem to matter on different regex systems, like https://regex101.com/.

I cannot find anything about this behaviour on Google, but I would like to know more, because I want to be sure to know what my scripts do.

Tom
  • 123
  • 3
  • No, it doesn't matter. – L. Scott Johnson Feb 25 '20 at 12:39
  • 1
    It matters *when the character is a hyphen* - see related [Odd behavior when trying to match hyphens with grep 2.27](https://unix.stackexchange.com/questions/443042/odd-behavior-when-trying-to-match-hyphens-with-grep-2-27) for example – steeldriver Feb 25 '20 at 12:43
  • It also matters when it's a `]` or `^` or if it's part of a character class name, e.g. `[:space:]`. – Ed Morton Feb 25 '20 at 22:19

2 Answers2

3

There are a few rules. The important one in this case is that - is a range operation so you can say a-f rather than abcdef inside a class. To include a - as a literal character it is simplest if it is the last character in the class, but it can be the first or either end of a range.

If you want to negate a set of characters then the first character must be ^. To include it as a literal then it mustn't be the first.

As ] ends a class there is a special case that allows it to be the first (or second if the first character is ^ to negate the class), so []abc] is a set of 4 characters, a b c or ].

icarus
  • 17,420
  • 1
  • 37
  • 54
2

Yes it matter, as [0-9\:\-] matches any single character from the set of digits, backslash, colon, or dash, while [0-9\-\:] does not match a dash. In the second expression, the dash signifies a range between the backslash character and the backslash character (backslashes are literal is character classes), and the expression is equivalent to [0-9\:] (or, for that matter [\0-9:]).

The dash does not signify a range of characters if it's first (possibly after ^) or last in a character class.

Also note that sed deals with POSIX regular expressions, which I don't think the site that you link to explicitly supports (see Why does my regular expression work in X but not in Y?).

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
  • I see, thanks a lot. That means my problem/error was trying to escape the colon and dash which is allowed for some reason in https://regex101.com . – Tom Feb 25 '20 at 13:09