12

Apparently, a bracket expression in BSD Awk which contains a character class will ignore any further characters after the character class:

MacOS $ cat file.txt
_
-
.
a
B
8
:
;
@
~
,
MacOS $ awk '/[@~.[:alnum:]:;-]/' file.txt 
.
a
B
8
@
~
MacOS $ awk '/[-;:@~.[:alnum:]]/' file.txt 
-
.
a
B
8
:
;
@
~
MacOS $ awk '/[^@~.[:alnum:]:;-]/' file.txt 
_
-
:
;
,
MacOS $ awk '/[^-;:@~.[:alnum:]]/' file.txt 
_
,
MacOS $ 

On GNU Awk (shown on Ubuntu 16.04), the behavior is different; other characters in the bracket expression are handled the same regardless of whether they come before or after the character class:

Linux $ cat file.txt
_
-
.
a
B
8
:
;
@
~
,
Linux $ awk '/[@~.[:alnum:]:;-]/' file.txt 
-
.
a
B
8
:
;
@
~
Linux $ awk '/[-;:@~.[:alnum:]]/' file.txt 
-
.
a
B
8
:
;
@
~
Linux $ awk '/[^@~.[:alnum:]:;-]/' file.txt 
_
,
Linux $ awk '/[^-;:@~.[:alnum:]]/' file.txt 
_
,
Linux $ 

Is this documented anywhere? Or, if it is a bug, is it a known bug? (And if it is a known bug, is it fixed in later versions of Awk?)


What should I do with this discovery? Is there somewhere I should open a bug report?

Kusalananda
  • 320,670
  • 36
  • 633
  • 936
Wildcard
  • 35,316
  • 26
  • 130
  • 258
  • 8
    MacOS is a poor choice of reference since many of the userland utilities haven't been updated for more than a decade. I'd use FreeBSD for a reference (in preference to NetBSD and OpenBSD), but one of those is more likely to have an updated awk, etc. – Thomas Dickey Apr 06 '17 at 00:03
  • To echo what @ThomasDickey said in his comment here, https://unix.stackexchange.com/questions/352977/why-does-this-bsd-grep-result-differ-from-gnu-grep is another case of a weird bug (in grep) that apparently only occurs in the OS X version of a utility and not in the current BSD version found on other distros – sideshowbarker Apr 06 '17 at 00:33
  • It would be a bug in that it's not POSIX-compliant, but I also can't reproduce it on a macOS system here... – Michael Homer Apr 06 '17 at 01:11
  • @MichaelHomer, `awk --version` shows 20070501 and the macOS version is 10.11.6. – Wildcard Apr 06 '17 at 01:14
  • Yes, same here, and both 10.12 and 10.11. The first two commands produce identical output, as do the second two. I wonder if there's something environmental. – Michael Homer Apr 06 '17 at 01:16
  • 1
    Indeed: try `LANG=C awk ...` versus `LANG=xx.UTF-8`. – Michael Homer Apr 06 '17 at 01:17
  • @MichaelHomer, that does it. Want to put it as an answer? – Wildcard Apr 06 '17 at 01:21
  • 2
    It's not really an answer - it's still a bug, it's just one that doesn't show up in the C locale. I don't know about any of the other questions. – Michael Homer Apr 06 '17 at 01:23
  • 2
    FWIW: `awk`, `mawk` and `gawk` are consistent in their behaviour on OpenBSD and does the right things. – Kusalananda Apr 20 '17 at 20:22
  • 1
    I think I just stumbled over the same bug: `echo 'foo:bar: info' | awk '/^[:[:alpha:]]+: /'` outputs `foo:bar: info` as expected while `echo 'foo:bar: info' | awk '/^[[:alpha:]:]+: /'` outputs nothing unless I prefix it with `LANG=C`. `awk --version` outputs `awk version 20070501`. GNU awk produces the expected output regardless of the order inside the bracket expression. – Ed Morton Dec 20 '17 at 14:58
  • You should consider this the upstream: https://github.com/onetrueawk/awk – Rafael Kitover Jul 30 '19 at 01:30

1 Answers1

1

Test on FreeBSD 11.2-RELEASE matched the correct results, and matching your results on a recent GNU/Linux.

Perhaps the behavior is a more general case of the bug described here: https://github.com/onetrueawk/awk/issues/45

As far as filing a bug, since it seems that the bug per se is already fixed, and the remaining problem is that Apple is so far behind "upstream", you could try this (though I have not gone through with it before): https://developer.apple.com/bug-reporting/

Note the very last option, which might be a plan B, if we can safely presume that in their current Beta releases, they still haven't updated the UNIX userland from upstream.

Justin
  • 111
  • 3
  • Apple's unwillingness to update userland command line tools could possibly stem from not wanting to break bug-for-bug backward compatibility (it usually does, and there is _some_ merit in that), and this is partly why [Homebrew](https://brew.sh/) (and similar projects) exist. – Kusalananda Nov 22 '19 at 16:46
  • @Kusalananda that is a good point, and something to keep in mind. Such as it is, if many, many customers communicate through multiple channels that they care about it and prefer the bug fixed, maybe, just maybe, it will sort of bubble up in some review queue and eventually outweigh that concern, or at perhaps get tested for absence of actual (vs suspected/presumed) impact. – Justin Nov 22 '19 at 17:09