Q1. Can an attacker gain root on my host OS using only the NET_ADMIN capability?
Yes (in some cases).
CAP_NET_ADMIN lets you use the SIOCETHTOOL ioctl() on any network device inside the namespace. This includes commands like ETHTOOL_FLASHDEV, i.e. ethtool -f.
And that's the game. There is a little more explanation in the quote below.
SIOCETHTOOL is allowed inside any network namespace, since commit 5e1fccc0bfac, "net: Allow userns root control of the core of the network stack". Before then, it was only possible for CAP_NET_ADMIN in the "root" network namespace. This is interesting because of the security considerations that were pointed out at the time. I had a look at the code in kernel version 5.0, and I believe the following comments still apply:
Re: [PATCH net-next 09/17] net: Allow userns root control of the core of the network stack.
For the same reason you had better be very selective about which ethtool
commands are allowed based on per-user_ns CAP_NET_ADMIN. Consider for a
start:
ETHTOOL_SEEPROM => brick the NIC
ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU
These are prevented by not having access to real hardware by default. A
physical network interface must be moved into a network namespace for
you to have access to it.
Yes, I realise that. The question is whether you would expect
anything in a container to be able to do those things, even with a
physical net device assigned to it.
Actually we have the same issue without considering containers -
should CAP_NET_ADMIN really give you low-level control over hardware
just because it's networking hardware? I think some of these ethtool
operations, and access to non-standard MDIO registers, should perhaps
require an additional capability (CAP_SYS_ADMIN or CAP_SYS_RAWIO?).
I guess the lockdown feature has a similar issue. I didn't notice the lockdown patches in the results while I was searching. I suppose the solution for the lockdown feature would be some kind of digital signature, similar to how lockdown only allows signed kernel modules.
Q2. If they somehow obtain some code execution from my app, will they have unlimited power?
I'm splitting this out as a narrower case, specific to your command -
sudo docker run --restart always --network host --cap-add NET_ADMIN -d -p 53:53/udp my-image
As well as capabilities, the docker command should also impose seccomp restrictions. It might also impose LSM-based restrictions, if they are available on your system (SELinux or AppArmor). However, neither of these seem to apply to SIOCETHTOOL:
I think seccomp-bpf could be used to block SIOCETHTOOL. However the default seccomp configuration for docker does not try to filter any ioctl() calls.
And I did not notice any LSM hooks in the kernel functions I looked at.
I think Ben Hutchings made a good point; the ideal solution would be to restrict this to CAP_SYS_RAWIO. But if you change something like that and too many people "notice" - i.e. it breaks their setup - then you get Angry Linus shouting at you :-P. (Especially if you're working on this because of "secure boot"). Then the change gets reverted, and you get to work out what the least ugly hack is.
I.e. the kernel might be forced to maintain backwards compatibility, and allow processes which have CAP_NET_ADMIN in the root namespace. In that case, you would still need seccomp-bpf to protect your docker command. I am not sure that it would be worth trying to change the kernel in this case, as it would only protect (some) containers. And maybe container runtimes like docker could be fixed to block SIOCETHTOOL by default. That might be a workable default for "OS containers" like LXC / systemd-nspawn as well.