27

If a shell is asked to perform a probably useless (or partially useless) command known to terminate, such as cat hugeregularfile.txt > /dev/null, can it skip that command's execution (or execute a cheaper equivalent, say, touch -a hugeregularfile.txt)?

More generally, is the shell similar to C compilers in that it may perform any transformation on the source code, so long as the externally observable behaviour is as-if the abstract machine evaluated it?

EDIT

Nota Bene: My question as originally posed had a title that asked whether the shell is permitted to do these optimizations, not whether it should or even whether implementations that can do them exist. I'm interested in the theory more than the practice, although both are welcome.

  • No, the shell isn't as _smart_ as modern compilers. In fact, it's rather dumb. It wouldn't optimize any useless code. – devnull Mar 10 '14 at 06:09
  • 12
    Guessing what the user's intention is is not something the shell should do. The user could be trying to do almost anything with that command, optimising it out would be the wrong thing to do, even if it was possible. – Chris Down Mar 10 '14 at 06:11
  • 1
    No matter saying that if the file was a device then `cat`ting it makes a big difference. The shell can get to know that the file is a device, but it need not be reliable. – yo' Mar 10 '14 at 16:29
  • `cat`ting a file to /dev/null isn't always useless. Perhaps they want a (not a very good) way to benchmark disk reads. – hometoast Mar 10 '14 at 17:04
  • _permitted_ by whom? I can certainly write a shell that outputs `"yes I can do that"` for any command that you enter. I do not need to ask the permission to anybody for doing that. However, if I do write such a shell, I suspect nobody will use it, just like I would avoid a shell that doesn't do what I ask it to do. – Stéphane Chazelas Mar 10 '14 at 20:44
  • 3
    @StephaneChazelas C compilers don't need to "ask permission from someone" to optimize their compiled programs; There is an _as-if_ rule in the C standard which permits them to do so. The POSIX standard appears to have standardized at least one shell (http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html), as well as numerous other utilities (http://pubs.opengroup.org/onlinepubs/009604499/utilities/wc.html for `wc`, for instance). But to the best of my knowledge POSIX doesn't take a position on shell optimization; Or does it? – Iwillnotexist Idonotexist Mar 10 '14 at 21:04
  • 2
    Optimisation is improving performance with shortcuts without affecting functionality. As long as the functionality is guaranteed, I can't see POSIX objecting. Your proposed optimisation would break [the cat spec though](http://pubs.opengroup.org/onlinepubs/9699919799/) though. There are specific wordings in the POSIX spec that are there to accommodate the type of optimisation done by `ksh`. Like they don't say _separate process_ but _subshell environment_ to allow fork-saving optimisations. – Stéphane Chazelas Mar 10 '14 at 21:12
  • I used that exact command recently to test NFS transfer speeds. If the shell had optimized it out, I would've been quite annoyed. – mikebabcock Mar 11 '14 at 13:12

4 Answers4

26

No, that would be a bad idea.

cat hugeregularfile.txt > /dev/null and touch -a hugeregularfile.txt are not the same. cat will read the whole file, even if you redirect the output to /dev/null. And reading the whole file might be exactly what you want. For example in order to cache it so that later reads will be significantly faster. The shell can't know your intention.

Similarly, a C compiler will never optimize out reading a file, even if you don't look at the stuff you read.

scai
  • 10,543
  • 2
  • 25
  • 42
  • You raise a good point, but C also possesses the `volatile` keyword (to ensure that apparently-useless reads and writes are done regardless, and in the order specified), and in C the standardized library function `fread()` only performs the read part of this work, not the write part, and so understandably cannot be optimized. On the other hand, the shell _can_ see all that is about to happen; It executes a complete command at a time, so I was wondering if in its analysis thereof it is permitted to determine that the command is being evaluated only for its side-effects, and act accordingly. – Iwillnotexist Idonotexist Mar 10 '14 at 13:15
  • 2
    @Iwillnotexist: Every useful command (except arguably `true` and `false`) has potential side effects, and the side effects are almost always the point of invoking the command. The shell couldn't know those side effects in advance (for external programs like `cat`) without solving the Halting Problem. So it rightly doesn't try, and assumes you meant what you said. – cHao Mar 10 '14 at 14:00
  • 5
    @IwillnotexistIdonotexist No, the shell can't see all that is about to happen. It has no idea about `cat`. In fact, `cat` could do anything from formatting your hard drive to downloading the Internet. – scai Mar 10 '14 at 14:36
  • 5
    "Unix was not designed to stop its users from doing stupid things, as that would also stop them from doing clever things." – Doug Gwyn – Agi Hammerthief Mar 10 '14 at 16:44
  • 7
    @cHao And even `true` and `false` set `$?`. – Kyle Strand Mar 10 '14 at 19:17
  • 3
    As @scai pointed out above, executables are not like language keywords: `cat` and `/dev/null` have typical meanings but they aren't _guaranteed_ to behave that way. To perform optimizations while guaranteeing no change to expected behaviour, the optimization could only be allowed involve constructs implemented within the shell itself and not things found in the execution environment... no matter how intuitive their names might seem. – andybuckley Mar 11 '14 at 19:59
20

No, since /dev/null is just a name, which could be used for any other device or for a file other than what "normally" is a data sink.

So a shell (or any other program) has no idea, based on the name, whether the file it is writing to is doing something "for real" with the data. There are AFAIK also no system calls the shell program can make, to determine that e.g. the file descriptor is actually not doing anything.

Your comparison with optimising away code in a C program does not work, as a shell does not have the total overview that a C compiler has over a piece of source code. A shell doesn't know enough about /dev/null to optimize your example away, more like a C compiler doesn't know enough about code in a function call it dynamically links to, to not make the call.

Anthon
  • 78,313
  • 42
  • 165
  • 222
  • 4
    As it turns out, ksh93 will treat `/dev/null` specially, sometimes. A builtin that has its stdout directed to `/dev/null`, e.g `echo foo >/dev/null`, will not result in any writes being done to `/dev/null`. It doesn't do anything special if it's invoking a non-builtin command (such as `cat file >/dev/null`). – Mark Plotnick Mar 10 '14 at 19:05
  • Matter of fact, `cat` could also be something else. Anything else in fact. – orion Mar 10 '14 at 19:19
  • 3
    Actually `/dev/null` is one of the very few [standardized paths](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap10.html), along with `/dev/tty`, `/dev/console`, `/tmp`, `/dev/` and `/`. – Gilles 'SO- stop being evil' Mar 10 '14 at 22:08
  • 2
    @MarkPlotnick Actually `cat` _is_ a ksh93 builtin (not enabled unless you put `/opt/ast/bin` before `/bin` (or wherever any `cat` is available) in `$PATH`). And yes, though `cat file > /dev/null` with that builtin does `read` the content of `file`, it does _not_ write it to /dev/null (though it opens and fstats it). – Stéphane Chazelas Mar 11 '14 at 17:35
15

It will not optimise out running commands (and you've already received a number of fine answers telling you why it should not), but it may optimise out forks, pipe/socketpairs, reads in some cases. The kind of optimisations it may do:

  • With some modern shells, the last command in a script can be executed in the process of the shell unless some traps have been set. For instance in sh -c ls, most sh implementations (bash, mksh, ksh, zsh, yash, some versions of ash) won't fork a process to run ls.
  • in ksh93, command substitution will not create a pipe or fork a process until an external command is called ($(echo foo) for instance will expand to foo without a pipe/socketpair or fork).
  • the read built-in of some shells (bash, AT&T ksh) will not do single-byte reads if they detect stdin is seekable (in which case they will do large reads and seek back to the end of what they are meant to read).
Stéphane Chazelas
  • 522,931
  • 91
  • 1,010
  • 1,501
  • I like this answer, but it's unclear whether this is original reason, or whether the info is taken from some reference (which I'd like to delve into) – yoniLavi Mar 10 '14 at 14:40
  • 3
    @yoniYalovitsky, that's _original reason_. `ksh93` is the shell leading the way on the _optimisation_ front as its aim is/was to be seen as being on par with programming languages like `perl`. So you can have a look at `ksh` documentation, code (good luck) and mailing list for further info. – Stéphane Chazelas Mar 10 '14 at 20:38
  • @stephane You imply that `sh -c` will just exec `ls`? – Henk Langeveld Mar 14 '14 at 07:51
  • 1
    @HenkLangeveld, yes, you can verify that with `sh -c 'ps -p "$$"'` which will give you `ps` and not `sh` with those `sh` implementations, or with strace/truss/tusc... – Stéphane Chazelas Mar 14 '14 at 09:37
  • 1
    The difference between `ksh -c 'ps; ps'` and bash -c 'ps;ps' is interesting. Ksh93 goes further in its optimisation. – Henk Langeveld Mar 14 '14 at 09:52
  • Ha, that's exactly what I'd have done, were I not using android then. – Henk Langeveld Mar 14 '14 at 09:53
  • 1
    @HenkLangeveld, depends what implementation of `ksh` we're talking about here. `mksh` behaves like `bash`. That behaviour is mostly meant to optimize things like `system("some command")`. Note that there's a side effect of that optimisation when it comes to the exit status of processes terminated by a signal (in some shells). `ksh93` used to have a _bug_ in that it was doing the optimisation even when traps were set. – Stéphane Chazelas Mar 14 '14 at 10:04
7

When seeing cat hugeregularfile.txt > /dev/null, the shell is not allowed to believe that the action is useless — cat is not part of the shell and could do anything at all in theory, and also in practice.

For example, the user may have renamed the executable rm to cat, and suddenly the line performs externally observable behavior, i.e., removing the file.

The user may have compiled a version of cat that goes into an infinite loop, thus the shell cannot assume that it is 'known to terminate' as you suggest.

Someone may have installed a version of cat that works as intended, but with an extra side effect of installing a rootkit if it's ever run with adequate privileges — again, the shell should duly execute it.

Peteris
  • 310
  • 1
  • 6
  • 2
    Actually, `mksh` does in fact optimize `V=$(cat file)` by making it a builtin. So the shell may optimize it out, but not *transform* it to just a `touch -a`. – Steve Schnepp Mar 11 '14 at 12:41
  • 1
    @SteveSchnepp, `cat` *is* a builtin in `mksh`, but that builtin resorts to the system's `cat` if passed any option, which is why with GNU `cat`, `mksh -c 'cat /dev/null --help'` doesn't yield the same result as `bash -c 'cat /dev/null --help'`, but `mksh -c 'cat --help /dev/null'` does give you the same as `bash -c 'cat --help /dev/null'` (as `mksh` cat builtin parses options the POSIX way, while GNU cat parses them the GNU way). – Stéphane Chazelas Mar 12 '14 at 16:02
  • In bash and ksh93, the `V=$(cat file)` can be optimised with `V=$(< file)`. This speeds up things even without a builtin `cat`. – Henk Langeveld Mar 14 '14 at 11:03