21

After some googling, I found a way to compile BASH scripts to binary executables (using shc).

I know that shell is an interpreted language, but what does this compiler do? Will it improve the performance of my script in any way?

adazem009
  • 611
  • 1
  • 6
  • 10
  • 5
    if your shell script do a lot of I/O (checksum, `awk` , `grep` and friends, walking directories) a compiled version isn't going to improve it significantly. – Archemar Oct 20 '20 at 08:29
  • 7
    "Performance" and "bash script" are incompatible concepts. If you need "performance", rewrite your program in a compiled language, C, C++, Fortran, .... – waltinator Oct 21 '20 at 17:10
  • 2
    The OP didn't say they were "interested" in performance. They just wanted to know if a performance boost was the result of such "compilation." – John Carrell Oct 22 '20 at 13:26
  • 1
    @waltinator You're conflating optimal performance with increased performance. Just because a particular method can't achieve the former doesn't mean it isn't worth aiming for the latter. Rewriting a bash script in C++ is unlikely to be a sensible proposition for most use cases; whereas a quick and simple step that can increase performance by a smaller amount might be. – JBentley Oct 23 '20 at 07:32

3 Answers3

55

To answer the question in your title, compiled shell scripts could be better for performance — if the result of the compilation represented the result of the interpretation, without having to re-interpret the commands in the script over and over. See for instance ksh93's shcomp or zsh's zcompile.

However, shc doesn’t compile scripts in this way. It’s not really a compiler, it’s a script “encryption” tool with various protection techniques of dubious effectiveness. When you compile a script with shc, the result is a binary whose contents aren’t immediately readable; when it runs, it decrypts its contents, and runs the tool the script was intended for with the decrypted script, making the original script easy to retrieve (it’s passed in its entirety on the interpreter’s command line, with extra spacing in an attempt to make it harder to find). So the overall performance will always be worse: on top of the time taken to run the original script, there’s the time taken to set the environment up and decrypt the script.

Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • Thanks for your answer. Are there other ways to compile shell scripts without using `shc`? – adazem009 Oct 20 '20 at 08:25
  • 3
    Comeau Computing used to market [CCsh](http://web.archive.org/web/20181204010521/http://www.comeaucomputing.com/faqs/ccshfaq.html), but they’re out of business, and I’m not aware of any other shell compiler. – Stephen Kitt Oct 20 '20 at 08:45
  • 27
    To be honest, if performance is that big an issue you probably shouldn't be looking at shell scripts in the first place. – Shadur Oct 20 '20 at 10:18
  • 12
    @Shadur In addition, many times shell scripts are calling compiled programs that do the heavy lifting, so it's not too much of a boost anyways. E.g., if you have a script that calls awk, sed, and grep, those are all compiled. – Captain Man Oct 20 '20 at 16:50
  • 5
    @CaptainMan: If you were programming in a compiled language, you'd read the data yourself and use a regex library instead of fork+exec of multiple separate processes, each of which have to pay the startup overhead for a new process, and for dynamic linking (which is a significant part of the total cost for running `grep` on a short file). Also, for large amounts of data, avoiding piping the data between processes, costing some overall memory bandwidth, and synchronization between cores in the kernel. You wouldn't expect a shell-script compiler to do that, hence Shadur's point: avoid entirely – Peter Cordes Oct 21 '20 at 04:12
  • 1
    ...and if you're calling awk, sed and grep, the details of those calls can be traced by anyone who cares, so there's very little security that something trying to obfuscate the parent shell's actions can achieve. – Charles Duffy Oct 21 '20 at 16:22
  • 4
    So essentially the only valid purpose of `shc` was for people who write insecure code and want to hide their hard coded passwords and vulnerabilities, or script kiddies who are trying to hide obvious malware with the lowest amount of effort possible? – john doe Oct 21 '20 at 21:24
  • @l0b0 it uses Alleged RC4. – Stephen Kitt Oct 22 '20 at 17:35
  • @johndoe That's the short of it. – Shadur Oct 23 '20 at 07:14
34

After some googling, I found a way to compile BASH scripts to binary executables (using shc).

It's quite unfortunate that that shc contraption is still featured in google search results, even after it has been utterly debunked all these years: shc is not a compiler, and it does not prevent the source code of the script from being looked at and "stolen".

If anything, shc is even stupider than it has to be, because, after unmangling the script source, it's just passing it as an argument to bash -c, which means that it's visible in /proc/<pid>/cmdline to any user, not just the one running the script. That also runs into the Linux's length limit for a single command line argument (128k bytes). But to make things even more ridiculous, the first part of that argument is filled up with white spaces, so it doesn't appear in ps ;-)

Will it improve the performance of my script in any way?

Yes, your script may not work at all, which means that it will terminate sooner.

  • 4
    Its current maintainer has been engaging in a lot of “marketing”, including creating a Wikipedia page for it which has somehow survived... – Stephen Kitt Oct 21 '20 at 14:53
  • 1
    this is the best answer. Very helpful info to know to steer clear of shc. – java-addict301 Oct 21 '20 at 20:33
  • 2
    The padding with spaces is useless since `ps` can easily be told to show the full command-line length, and will do that automatically when piped *e.g.* to `grep`. – Stephen Kitt Oct 21 '20 at 20:48
  • @StephenKitt, to be fair, at the time `shc` was written, that was not usually the case. On many systems, argument list of processes can only read by root (and you need a setuid root `ps` that only displays a few dozen bytes of the arg list), and until relatively recently you couldn't get more than 4KiB from the command line on Linux. – Stéphane Chazelas Oct 22 '20 at 16:51
  • @user431397, yes, the limitations of `shc` have been discussed at length over the years. I'm just arguing about the *`ps` can easily be told to show the full command-line*, which was not true on most systems until very recently. See also [ps: full command is too long](//unix.stackexchange.com/q/91561) – Stéphane Chazelas Oct 22 '20 at 17:34
  • 1
    @StéphaneChazelas fair point, since 4.2 five years ago which is quite recent (and somewhat ironically, in a similar timeframe to `shc`’s revival and all the fuss around it). – Stephen Kitt Oct 22 '20 at 17:42
  • @StephenKitt I had the opposite problem on some older OS (might have been Solaris or AIX): My 20K-char awk scripts used to appear in full on the command line in `ps` and drove the SysAdmins crazy. – Paul_Pedant Oct 24 '20 at 10:30
5

In general, there is no way to compile a shell script, because new source text can be introduced by several method at run time, which has therefore bypassed the compiler phase. That new source would be unable to interact with the compiled-in functions or variables.

Two methods of creating runtime source would be:

Source a side file that may have been created or modified since the original script was compiled.

Construct at runtime an arbitrary command in a string, and exec it.

Paul_Pedant
  • 8,228
  • 2
  • 18
  • 26