0

I'm trying to lock down a container using systemd-nspawn, so that only the specific syscalls I whitelist are allowed. Per the documentation, there's a pretty lax filter in place by default, consisting of a large whitelist of hundreds of different system calls. There's a unit option SystemCallFilter=, which claims to allow you to blacklist or whitelist specific calls. I tried it out, putting a single syscall on there and expecting complete failure:

[Exec]
...
# We use way more syscalls than this! This whitelist should fail, but it doesn't because it's not a real whitelist.
SystemCallFilter=open,write,close
...

Instead, the program runs just fine. I can get it to fail if I explicitly disallow a syscall I know is in use:

[Exec]
...
# This actually fails, because open's been explicitly blacklisted.
SystemCallFilter=~open,~write
...

Also, because the blacklist takes precedence over the "whitelist," I can't just disable everything and then turn back on only the ones I need; the whitelist is just ignored:

[Exec]
...
# Doesn't work, as ~@default takes precedence over the allowlist so *nothing* is allowed
SystemCallFilter=~@default
# full list is much longer and generated automatically from a docker seccomp .json
SystemCallFilter=open,write,close,...

Is there a way to achieve the functionality I want? I really don't want to maintain a blacklist of all of the hundreds of syscalls on the default allowlist, which seems like the only way to do it currently.

  • Are you starting a docker container with that? If your unit invokes docker the entire exercise is pointless since docker actually delegates everything to its privileged dockerd process. – Ginnungagap Aug 14 '23 at 20:42
  • No, I'm starting a `systemd-nspawn` with a `.nspawn` file in `/etc/systemd/nspawn/servicename.nspawn`. I'm trying to replace Docker with `systemd-nspawn` and port over my seccomp.json security profiles. – SwarmOfBees Aug 14 '23 at 21:31

0 Answers0