Context
In short I'm working over a feature to provide outbound connection count rate and hard limiting per destination host of containers in a container networking solution (see silk-release). An overlay network managed by vxlan is created where a private IP is dedicated to each container.
We're using CNI as a trigger to place & configure networking artifacts (incl. iptables rules & chains) whenever a new container is spawned.
Problem
Whenever a container gets started on the VM we configure the iptables rules for its outbound traffic in the following way:
iptables -N netout--container-id
# where the container ip from the overlay network is "1.2.3.4/32"
iptables -A FORWARD -s 1.2.3.4/32 -o underlay-interface -j netout--container-id
iptables -A netout--container-id -m state --state RELATED,ESTABLISHED -j ACCEPT
iptables -A netout--container-id -p tcp -m state --state INVALID -j DROP
# hard limit rule allowing a maximum of 1000 concurrent connections per dest ip
iptables -A netout--container-id -m conntrack --ctstate NEW -m connlimit --connlimit-above 1000 --connlimit-mask 32 --connlimit-daddr -j REJECT
# rate limit rule allowing 100 new connections per sec with a burst of 999 per dest ip
iptables -A netout--container-id -m conntrack --ctstate NEW -m hashlimit --hashlimit-above 100/sec --hashlimit-burst 999 --hashlimit-mode dstip --hashlimit-name container-id --hashlimit-htable-expire 10000 -j REJECT
# A bunch of accept rules...
# And finally a reject all rule
iptables -A netout--container-id -j REJECT --reject-with icmp-port-unreachable
This works brilliantly! The limiting rules are applied correctly and we have tested this by running ssh into a container that opens too many connections to a single host with the command:
# monitors the number of open connections to the destip
watch -n 1 'netstat -anp tcp | grep ESTABLISHED | grep -v <dest-port> | grep <proc-name> | wc -l'
The problem occurs whenever a new container is created. Then new iptables rules are added which causes netfilter (if I'm not mistaken) to get restarted and connlimit to lose track of all of the open connections.
Thus assuming our example from the snippet above (max 1000 connections), if a container has already reached its hard limit when a new container is started (i.e. new rules added) the limit would be reset and it would be capable of opening 1000 more until the next container is started and so on.
There is no such problem with the hashlimit module because it maintains its entries in /proc/net/ipt_hashlimit/<container-id> files. I'm wondering if it is possible to recreate the same scenario with connlimit or at least simulate connection hard limits with other iptables modules that are "persistent" like hashlimit. I was thinking about using the recent module but couldn't come up with anything.
The issue is similar to this one but with a different use case since I need to be able to dynamically modify the rules without connlimit (or some other module) losing track of the currently opened connections.
Thank you in advance for your support.
PS - If you need more architectural details I could easily provide them but I believe the RC of the problem is the one I've stated above.