1

I have been testing binfmt_misc feature of Linux on Debian 10, and have found that setting the flags to "OC", to use the credentials of the binary instead of interpreter, causes execution to fail silently.

In the POC below, /tmp/test.sh is the interpreter, while qux.go is the binary. Why is /tmp/test.sh executed successfully without flags, when it fails silently with flags "OC"?

POC:

$ touch qux.go
$ chmod +x qux.go
$ cat <<EOF >/tmp/test.sh                                                                                                                                                                                                          
> #!/bin/sh                                                                                                                                                                                                                                                                       
> echo Golang                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        
> EOF      
$ chmod +x /tmp/test.sh 
$ echo ':golang:E::go::/tmp/test.sh:' | sudo tee /proc/sys/fs/binfmt_misc/register 
:golang:E::go::/tmp/test.sh:
$ ./qux.go 
Golang
$ echo -1 | sudo tee /proc/sys/fs/binfmt_misc/golang 
-1
$ echo ':golang:E::go::/tmp/test.sh:OC' | sudo tee /proc/sys/fs/binfmt_misc/register 
:golang:E::go::/tmp/test.sh:OC
$ ./qux.go # no output

Also:

mount | grep binfmt_misc
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=28,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=658)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,nosuid,nodev,noexec,relatime)

Bonus:

Some resources claim that binfmt_misc could be used for container-to-host escapes. However, as I see it, the interpreter path is evaluated within the chroot'd filesystem of the container, and execution of the interpreter is happening within the container, i.e. ls -la / shows the container root (not the host root).

Resource:

https://www.kernel.org/doc/html/latest/admin-guide/binfmt-misc.html

Shuzheng
  • 4,023
  • 1
  • 31
  • 71

1 Answers1

2

You’re being tripped up by two features.

The first is that, when exec fails, the shell will look at the contents of the file you’re attempting to run, and if it looks like a shell script, it will interpret it itself. An empty file looks like a shell script. You can see this by running strace -f ./qux.go, which shows the failing exec, and by changing qux.go:

$ echo echo Failed Golang > qux.go
$ ./qux.go
Failed Golang

The other feature is that the O flag doesn’t work with cascading interpreters: in your case, qux.go needs an interpreter, but that interpreter itself needs an interpreter, /bin/sh, and there are thus two files to interpret, test.sh and qux.go — but only one final executable file can be handled in O mode. The following works:

$ cat <<EOF > /tmp/test.c
#include <stdio.h>

int main(int argc, char **argv) {
  puts("Golang");
  return 0;
}
EOF
$ make /tmp/test
cc     /tmp/test.c   -o /tmp/test
$ echo ':golang:E::go::/tmp/test:OC' | sudo tee /proc/sys/fs/binfmt_misc/register
:golang:E::go::/tmp/test:OC
$ ./qux.go
Golang
Stephen Kitt
  • 411,918
  • 54
  • 1,065
  • 1,164
  • Very detailed. Thanks! I don’t see that “O” doesn’t work with cascading interpreters in the docs, do you? Why isn’t `binfmt_misc` just executing `/tmp/test.sh` using `exec()` without caring that it’s a shell script? Finally, do you know anything regarding container escapes? E.g `uevent_helper` is executed in the root namespaces, but that doesn’t seem to be the case for `binfmt_misc`? – Shuzheng Mar 19 '21 at 08:59
  • So, “OC” flags really just work with native ELF binaries? – Shuzheng Mar 19 '21 at 09:00
  • The problem is that `C` implies `O`, and `O` changes the way binaries are supposed to be accessed. I’d have to look up the details, I don’t remember them off-hand. The same goes for container escapes; I do believe I’ve run non-native containers using static QEMU and `binfmt_misc` without a copy of QEMU inside the container, but that might have changed recently, I’d have to check. – Stephen Kitt Mar 19 '21 at 09:13
  • Thanks. I'm not sure what you mean by *and `binfmt_misc` without a copy of QEMU inside the container* - Also, "OC" flags means that only native ELF binaries can be used as the interpreter? – Shuzheng Mar 19 '21 at 09:25
  • `O` implies that only executables which don’t need an interpreter themselves can be used as the interpreter (in practice, native ELF binaries, yes); see [the source of the `NOEXEC` in this case](https://elixir.bootlin.com/linux/latest/source/fs/exec.c#L1771): there’s a single `bprm` structure, which needs to encapsulate the final executable being run, but there are two executables to interpret (`test.sh` and `qux.go`). – Stephen Kitt Mar 19 '21 at 09:35
  • Regarding QEMU, I’ve run non-native containers with QEMU only on the host, which means that the `binfmt_misc` interpreter was outside the container... – Stephen Kitt Mar 19 '21 at 09:37