0

I am writing a script which needs to be portable across Apple MacOs and Ubuntu. On the former 'awk' is (I believe) provided by nawk, while on the latter, gawk. There are significant differences between the implementations.

Specifically I'm developing on Ubuntu 22.04LTS where, unfortunately...

# apt install nawk
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package nawk

(same on Debian Bullseye)

I also tried downloading the nawk source code and compiling however yacc is not available on my distro and Bison is not sufficiently compatible to run the makefile.

Is there a way to make gawk behave like nawk?

Failing that, is there a Linux distro which has nawk available from repo?

terdon
  • 234,489
  • 66
  • 447
  • 667
symcbean
  • 5,008
  • 2
  • 25
  • 37
  • While gawk has many differences from barebones awk/nawk, you don't have to use them, what exactly are you trying to avoid? To me, the way to make gawk behave like awk/nawk is to not use any of the GNU features of gawk. Are you seeing some behavior where the same commands give different results between the two? If so, what? probably easiest to just create clean simple awk which should run the same, except for some bugs that either might have. Easy enough to test that using the two, if results are different, why? can you make them the same? – Lizardx May 16 '23 at 22:38
  • take a look at `--lint=fatal` `gawk` option. – DanieleGrassini May 16 '23 at 22:45
  • Debian/Ubuntu have other AWK implementations too, not just gawk. There's at least the [mawk](https://packages.debian.org/bullseye/mawk) and [original-awk](https://packages.debian.org/bullseye/original-awk) packages, and Busybox also has an implementation. I can't remember the the AWK lineage so not sure what their relation to the one on Macs is, but they might give you something better to compare against than gawk. And well, you could always [read the fine print from POSIX](https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html) (assuming the Mac one is compatible with that). – ilkkachu May 17 '23 at 06:33
  • What about just using the --posix option? I used gawk a lot, but I used it for the gnu extensions, so I never really got into the posix stuff, but I still don't see why you need to run anything that isn't cross-awk compatible. As Daniele notes, --lint fatal to test your code. After doing way too much cross os support, I think I can honestly say I never even think about posix since that doesn't actually solve any cross os issues I've ever hit, unless I want to drop the logic to the most trivial level. – Lizardx May 18 '23 at 21:57
  • @Lizardx - that only addresses some of the differences (primarily the regex engine). See comments below from Ed morton regarding some of the more subtle differences. OOB file I/O is completely different for instance. – symcbean May 19 '23 at 08:46
  • I don't know why I didn't think of this, but the actual answer here would have been to use Perl and call it good. Good info on the awk stuff though, thanks symcbean. I know when I was doing too much gawk, I found at least one real bug, with math, that proved to me that gawk was not being used in a serious production anymore because it was so glaring and non subtle (something about the number being in some range, around 1 million if I remember right, would corrupt the math). I only use awk now for the most basic functionality in one liners. – Lizardx May 19 '23 at 22:45

3 Answers3

3

Yes, there is at least one distro that has nawk in its repositories. I am sure there are many, but I am writing this from my Arch system and I can confirm that Arch has nawk:

$ pacman -Ss nawk
community/nawk 20220912-1 [installed]
    The one, true implementation of AWK

That said, a useful trick here is to use busybox awk instead. Busybox is a great tool, very useful and commonly found on embedded systems, which provides pared down versions of various standard tools:

BusyBox combines tiny versions of many common UNIX utilities into a single small executable. It provides replacements for most of the utilities you usually find in GNU fileutils, shellutils, etc. The utilities in BusyBox generally have fewer options than their full-featured GNU cousins; however, the options that are included provide the expected functionality and behave very much like their GNU counterparts. BusyBox provides a fairly complete environment for any small or embedded system.

One of the tools it provides is awk, so if you install busybox in your Ubuntu (sudo apt install busybox), you can then run busybox awk to get an a minimal awk. This is not nawk, but it is a simple, pared down version of awk, which should provide a far more portable toolset than gawk. If your script works with busybox awk, it is very likely to also work for both gawk and nawk. This is not a perfect solution, I did find a comment on another answer on this site that claimed that "Actually, the BusyBox awk is pretty close in behavior to gawk v3; I think it's more full-featured than nawk, but it's a start.

Finally, and perhaps most relevantly, gawk actually has a --posix option:

-P

--posix

Operate in strict POSIX mode. This disables all gawk extensions (just like --traditional) and disables all extensions not allowed by POSIX. See Common Extensions Summary for a summary of the extensions in gawk that are disabled by this option. Also, the following additional restrictions apply:

Newlines are not allowed after ‘?’ or ‘:’ (see Conditional Expressions). Specifying ‘-Ft’ on the command line does not set the value of FS to be a single TAB character (see Specifying How Fields Are Separated). The locale’s decimal point character is used for parsing input data (see Where You Are Makes a Difference).

If you supply both --traditional and --posix on the command line, --posix takes precedence. gawk issues a warning if both options are supplied.

So your best bet would be to use gawk --posix when testing, to ensure you only use portable features.


Or, it might not be. Ed Morton, one of our resident awk experts, said this in a now deleted comment:

gawk --posix doesn't ensure you only use portable features. For example with that option set split("foo",arr,"") would populate arr[] with each character from the string "foo" but other awks could populate arr[] with a single entry which is the whole string "foo" or do anything else and be POSIX compliant because field splitting using a null string as the separator is undefined behavior. What --posix does is turn off gawk extensions but you'd still have to manually be aware of writing code that relies on the gawk implementation of any of the several behaviors undefined by POSIX. – Ed Morton

Ed knows far, far more than I do about awk, so I would take his word for it.

terdon
  • 234,489
  • 66
  • 447
  • 667
  • 1
    About 5 years ago I actually created a spreadsheet of 13 undefined behaviors and asked the gawk provider if he'd be willing to add lint warnings about them and he did so for 9 of them (too hard to diagnose in awk for 3 of them and one he considered to be a bug in other awks, which I could see his point). So, running `gawk --lint` in a more recent gawk should give you a good (but not perfect) idea of cases where you're relying on undefined behavior. – Ed Morton May 16 '23 at 22:41
  • For example, try `echo '1 2' | gawk --lint '{NF--; split($0,arr,""); print > "foo" 7}'` with or without `--posix` to see reports of 3 different cases of undefined behavior followed by a 4th more general warning about not closing the output file. – Ed Morton May 16 '23 at 22:53
0

After more searching, I found a more recent version of nawk (or at least I believe it is nawk) signposted from https://www.cs.princeton.edu/~bwk/btl.mirror/index.html at https://github.com/onetrueawk/awk

However I'd still be interested to see if anyone has better suggestions.

symcbean
  • 5,008
  • 2
  • 25
  • 37
  • Yes, that is the official repository for `nawk`, also known as "the one true awk". The page you mention belongs to Brian Kernighan, one of the original authors of `awk` and the `K` in `awk`. – terdon May 16 '23 at 17:40
0

The answer to this kind of question is that you need a cross-platform project.

You should be able to check out your project on those platforms where it has to work, run whatever preparation is needed, and then execute the test case suite.

Whenever you release a new version of the script, you have to execute that test plan: update to the release baseline on all the supported platforms, and run the test cases, and execute whatever other test plan you have for gaining confidence that things work on each supported platform.

With some care, you should be able to write Awk code that produces the same results in GNU Awk, nawk and others.

the nawk source code and compiling however yacc is not available on my distro and Bison is not sufficiently compatible

I see that the "One True Awk" project has done something very silly. The makefile defines YACC = bison -d. This means that the awkgram.y grammar file is now at the mercy of the default behavior of whatever version of Bison the user has installed. To compound the problem, the project does not ship the generated parser sources that the maintainer actually builds and tests with. The downstream users are thus running different C code for a pretty important part of the program.

If you have difficulties with your Bison installation, try changing that to bison --yacc -d. bison is not really Yacc without the -y or --yacc arguments.

Failing that, generate the parser on some other platform, and use those generated files.

Even if you get nawk running on platform A, that doesn't mean you can assume your code works on platform B without testing.

Anyway, it looks like the One True Awk sources do not include the Yacc-generated parser, which is a mistake. What you can do is just run Yacc on a platform where it works, and then add the resulting y.tab.c and y.tab.h files to your local tree. Make sure you touch the time stamps so these files are newer than awkgram.y so the makefile doesn't try to rebuild them; or else tweak the makefile.

Yacc programs generate output that is intended to be portable C, so that downstream users can build the program without having Yacc installed. Projects with Yacc grammars should always publish the generated code, so that everyone is downstream is compiling the same C. It's already risky enough when people have the same C sources, but building them for different machines and environments.

I'm surprised Bison wouldn't be able to process the awkgram.y file in Brian Kernighan's awk. You have to use bison --yacc or bison -y. On systems where the Yacc implementation is provided by Bison, there is usually a script called yacc which passes its arguments to bison -y or bison --yacc. I just checked out https://github.com/onetrueawk/awk.git on a Ubuntu 18 instance where I have the default Bison 3.0.4 installation, as well as a Bison 2.5 in /usr/local/bin. Both of them accept awkgram.y without errors.

Kaz
  • 7,676
  • 1
  • 25
  • 46
  • The github repo built without issue - I was previously using a tarball found elsewhere on the internet which was rather old. – symcbean May 22 '23 at 08:31