Docker running an app with NET_ADMIN capability: involved risks

Question

I'm trying to run an app in a docker container.The app requires root privileges to run.

sudo docker run --restart always --network host --cap-add NET_ADMIN -d -p 53:53/udp my-image

My question is: What are the risks when adding the NET_ADMIN capability together with the --network host option.

If an attacker can somehow obtain some code execution from my app, will he have unlimited power since I'm running it as root or will he only have access to the networking part of the kernel? If so, what would be his attack surface (in other words, can he gain root on my host OS with only the NET_ADMIN capability set?)

Additional info that may be of interest: https://www.redhat.com/en/blog/preview-running-containers-without-root-rhel-76 — 0xSheepdog, Mar 26 '19 at 20:24
@0xSheepdog if you started running the container inside a userns which does not own the netns the container runs in, then it will not be able to use the CAP_NET_ADMIN privileges at all. So I don't think that's very helpful :-). — sourcejedi, Mar 26 '19 at 20:31

sourcejedi · Accepted Answer · 2019-11-19T16:29:20.133

Q1. Can an attacker gain root on my host OS using only the NET_ADMIN capability?

Yes (in some cases).

CAP_NET_ADMIN lets you use the SIOCETHTOOL ioctl() on any network device inside the namespace. This includes commands like ETHTOOL_FLASHDEV, i.e. ethtool -f.

And that's the game. There is a little more explanation in the quote below.

SIOCETHTOOL is allowed inside any network namespace, since commit 5e1fccc0bfac, "net: Allow userns root control of the core of the network stack". Before then, it was only possible for CAP_NET_ADMIN in the "root" network namespace. This is interesting because of the security considerations that were pointed out at the time. I had a look at the code in kernel version 5.0, and I believe the following comments still apply:

Re: [PATCH net-next 09/17] net: Allow userns root control of the core of the network stack.

For the same reason you had better be very selective about which ethtool commands are allowed based on per-user_ns CAP_NET_ADMIN. Consider for a start:

ETHTOOL_SEEPROM => brick the NIC
ETHTOOL_FLASHDEV => brick the NIC; own the system if it's not using an IOMMU

These are prevented by not having access to real hardware by default. A physical network interface must be moved into a network namespace for you to have access to it.

Yes, I realise that. The question is whether you would expect anything in a container to be able to do those things, even with a physical net device assigned to it.

Actually we have the same issue without considering containers - should CAP_NET_ADMIN really give you low-level control over hardware just because it's networking hardware? I think some of these ethtool operations, and access to non-standard MDIO registers, should perhaps require an additional capability (CAP_SYS_ADMIN or CAP_SYS_RAWIO?).

I guess the lockdown feature has a similar issue. I didn't notice the lockdown patches in the results while I was searching. I suppose the solution for the lockdown feature would be some kind of digital signature, similar to how lockdown only allows signed kernel modules.

Q2. If they somehow obtain some code execution from my app, will they have unlimited power?

I'm splitting this out as a narrower case, specific to your command -

sudo docker run --restart always --network host --cap-add NET_ADMIN -d -p 53:53/udp my-image

As well as capabilities, the docker command should also impose seccomp restrictions. It might also impose LSM-based restrictions, if they are available on your system (SELinux or AppArmor). However, neither of these seem to apply to SIOCETHTOOL:

I think seccomp-bpf could be used to block SIOCETHTOOL. However the default seccomp configuration for docker does not try to filter any ioctl() calls.

And I did not notice any LSM hooks in the kernel functions I looked at.

I think Ben Hutchings made a good point; the ideal solution would be to restrict this to CAP_SYS_RAWIO. But if you change something like that and too many people "notice" - i.e. it breaks their setup - then you get Angry Linus shouting at you :-P. (Especially if you're working on this because of "secure boot"). Then the change gets reverted, and you get to work out what the least ugly hack is.

I.e. the kernel might be forced to maintain backwards compatibility, and allow processes which have CAP_NET_ADMIN in the root namespace. In that case, you would still need seccomp-bpf to protect your docker command. I am not sure that it would be worth trying to change the kernel in this case, as it would only protect (some) containers. And maybe container runtimes like docker could be fixed to block SIOCETHTOOL by default. That might be a workable default for "OS containers" like LXC / systemd-nspawn as well.

Thank you very much! I'm didn't fully understand everything but I have a base now for further research.(I'm not even sure I'm allowed to thank you like this but I figured out it was worth a try :) ) — Doesntmatter, Mar 26 '19 at 20:23
So in any case it is better to drop privileges after I have done what is required by root right? But then again, even the user I dropped to would be able to access ioctl() because of the capabilities set when using docker run right? — Doesntmatter, Mar 27 '19 at 09:55
@Doesntmatter curious what you're doing on container startup (and never after startup), that requires CAP_NET_ADMIN ?? Anyway, the docker command is dropping capabilities *except* for CAP_NET_ADMIN. Capabilities are sub-sets of the traditional root powers. When you drop to non-root, you lose whatever root powers docker ran you with. Just as you would lose *all* root powers if you had been run as full root. Docker does not set the "ambient capabilities" that can be preserved when dropping to a different user. — sourcejedi, Mar 27 '19 at 10:26
See "Effect of user ID changes on capabilities" in [`man capabilities`](http://man7.org/linux/man-pages/man7/capabilities.7.html). I.e. when your program drops to a non-root user it will lose all capabilities, unless your program deliberately sets SECBIT_KEEP_CAPS because it wants to deliberately preserve specific capabilities. Such a program should manually drop all capabilities that it does not want to keep. — sourcejedi, Mar 27 '19 at 10:32
I'm basically experimenting with Iodine to tunnel traffic trough the DNS protocol inside a docker container. But I'm beginning to realize that running it inside a docker container actually doesn't provide any additional security benefits... I just wanted to isolate the process just in case because I'm planning on opening it to the internetz. Iodine requires root privileges first to bind to a low port (53) and to use ioctl(). After everything has been set up iodine can drop root privileges. But then I suppose there is no use to run it inside a container... — Doesntmatter, Mar 27 '19 at 10:46
That's the only explanation I've found until now about NET_ADMIN capabilities. I got here because I was looking for a way to run fail2ban in a docker container which got me into the problem: fail2ban -> iptables -> NET_ADMIN. I just wish that docker docs had more details about this subject(https://docs.docker.com/engine/reference/run/#runtime-privilege-and-linux-capabilities) — AndreDurao, Nov 19 '19 at 16:01
@AndreDurao Notice the question has two conditions. This answer is based on "together with the --network host option". If you don't use that, you might be able to run fail2ban safely. — sourcejedi, Nov 19 '19 at 16:32

Docker running an app with NET_ADMIN capability: involved risks

1 Answers1

Q1. Can an attacker gain root on my host OS using only the NET_ADMIN capability?

Q2. If they somehow obtain some code execution from my app, will they have unlimited power?