7

While writing some awk code I was using the --lint option for gawk 4.1.0. To my surprise I get a warning

warning: range of the form `[o-b]' is locale dependent

but my code has only a foo-bar outside of a character class. Simplified example:

{ match($2, /^uid=([^,]+),dc=foo-bar$/, m) }

Also I think gawk 4.2.1 does no longer output that warning, so is it a bug in gawk?

U. Windl
  • 1,095
  • 7
  • 21
  • 3
    That's not the only line of code in your `awk` program, right? – Kusalananda Jan 24 '22 at 11:44
  • 3
    Can you reproduce the problem with the code shown in the question? I suggest to create a minimal reproducible example by repeatedly removing parts of your original code and checking if you still get the warning message. See https://stackoverflow.com/help/minimal-reproducible-example – Bodo Jan 24 '22 at 12:01
  • @they What do you want to know specifically? That's the line that caused the error. In the original code some "match(...) > 0" was part of a Boolean expression in an `if` statement, but it does not change the essence of the problem. – U. Windl Jan 24 '22 at 13:53
  • 3
    @Bodo I don't understand: That *is* a minimal *reproducible* example! – U. Windl Jan 24 '22 at 13:54
  • 2
    Sorry, I misunderstood your question. To me it was not clear that the "simplified example" reproduces the problem. I suggest to make it more clear by showing the exact command you run and the resulting message. – Bodo Jan 24 '22 at 14:05
  • 3
    @U.Windl a minimal reproducible example would be a script we can run such as `awk --lint 'BEGIN{ match("foo", /^uid=([^,]+),dc=foo-bar$/, m) }'` if indeed that reproduces the problem. With just the match() call you provided we have to guess if you maybe weren't quoting the script properly or calling it from some other tool (e.g. python) or had some other line of code that was the actual problem or anything else. – Ed Morton Jan 24 '22 at 15:18

1 Answers1

18

This is a bug in Gawk 4.1.0 specifically. It was introduced by a7c502a756732ec9a1773d6169376bb7b25f4308 and fixed by d52d17b46e53bb0d4a991cd32f859eb349d3b101. The bug was first released in 4.1.0 and the fix was first released 4.1.1.

This is only a bug in the linter, not a bug in the code that is used to actually match text against the regular expression.

The bug causes the linter to keep looking past the closing bracket when it's looking for ranges in sets, so when it sees [set]more stuff with a-dash, it reaches the subsequent - and emits the warning. A workaround (if you really need a workaround for a linter-only bug in an old version) is to put the dash itself in a range: /^uid=([^,]+),dc=foo[-]bar$/ in your case. This may not work in all corner cases.

Gilles 'SO- stop being evil'
  • 807,993
  • 194
  • 1,674
  • 2,175