6

The last few boots of my Arch Linux, I noticed I had no network access. I'm using a netctl profile to give my adapter a static IPv4 address, which by itself works fine.

So I had a look at the logs, and the error was:

Duplicate Address Detection is taking too long on interface 'enp0s25'

netctl then quits with code 1, leaving the network in an unconfigured state.

Duplicate Address Detection is a feature of IPv6 and netctl uses it when the profile contains this line:

IPv6=stateless

Which should configure IPv6 automatically. Someone opened an issue for this on the Github project, where the author of netctl claims:

[...] If DAD takes more than 3 seconds (the default) you either have a very complex or slow network, or a misconfiguration in it.

And:

It sounds like somethin in your network is not configured properly. [...]

But, what exactly might be wrong with my network? It's a very simple infrastructure, there's just the modem/router combo of my ISP with 2 PCs on it, a small number of wireless devices and a few digital TV set-top boxes. Network quality at my home is perfectly fine overall, and the problem only started a few weeks ago.

The current workaround is either to disable DAD or increase the timeout, neither of which I really like.

MarioDS
  • 249
  • 3
  • 10

1 Answers1

4

DAD is inherently slow because it has to work without feedback.

The way DAD works is that before an address is activated on an interface, a neighbor discovery request is sent asking for the MAC address of the host which has that IP address.

If the address is duplicated the host which already has the address will respond, and DAD will fail quickly.

But when addresses are correctly configured, there will be no duplication, and hence there will be no answer to the request.

Since the sender cannot know how quickly a reply will come back, it has to wait. How quickly DAD completes depend on how long time the sender has been configured to wait for a reply.

It is important to notice that it depends only on the configuration of the sender of the request and not on how the rest of the network is configured. Anybody suggesting that a complicated network can slow DAD down probably haven't understood how it works.

It is possible to configure a machine to send multiple requests with a delay in between and only assign the address once a certain number of seconds have passed without reply. Such a configuration will obviously slow DAD down.

The system call to assign an IP address to an interface doesn't block waiting for DAD to complete. But if you proceed trying to bind a socket to the address before DAD has completed it will fail. This can lead to a race condition causing services to not come up during boot. The error message you are seeing may have been produced by a piece of code intended to wait for DAD to complete in order to avoid such a race condition. One bug which is easy to introduce in such code is to have it keep waiting for DAD to complete when DAD has in fact already failed due to a duplicated address.

In some situations the optimal way to deal with problems caused by DAD is to simply disable DAD. However you should of course first verify that you don't actually have a duplicated address. If you do have a duplicated address, then enabling, disabling, or reconfiguring DAD isn't going to solve your problems.

If your system is supposed to be the only legitimate user of an IP address and some other node is responding to ND requests for that IP, then the problem you are facing is ND spoofing and that is the first issue you need to address.

If however you have a scenario in which IP addresses are configured dynamically and an IP address could be legitimately claimed by any out of multiple nodes, then using DAD can help avoid conflicts and shouldn't be disabled.

kasperd
  • 3,540
  • 1
  • 20
  • 33
  • Thanks for the clear explanation. I will disable DAD then and see if I could make IPv6 assignment by DHCP. I have not included a static address in my profile, because the man page of netctl says that IPv6 will be "auto configured" when including the stateless line. That's actually pretty vague now that I think about it :-) – MarioDS Mar 28 '15 at 15:24
  • Very bad idea. DAD is fast enough for normal usage, under certain circumstances optimistic DAD is in order (RFC 4429, recent Linux kernels support this), but usually DAD will do without troubles. If DAD really takes excessively long, then this is due to another problem. ~ N.B. DHCPv6 does not replace DAD, devices are required to perform DAD even if the address is assigned through DHCPv6. – countermode May 04 '15 at 13:36
  • @countermode If I am configuring a system which I know is the only legitimate user of an IP address (or a range of addresses), then I would disable DAD as soon as I observed it causing the slightest problem. Of course in that setting another node spoofing that IP address would cause problems. But using DAD isn't going to help in that case. All DAD would achieve is to ensure that spoofing the IP address can prevent the legitimate user of the address from configuring it in the first place. – kasperd May 04 '15 at 14:23
  • @countermode Which Linux versions support RFC 4429? I have found no way to enable it on Ubuntu LTS. – kasperd May 04 '15 at 14:42
  • @kasperd: DAD is not a security mechanism, nor is it secure against spoofing. Yet, DAD is integral to IPv6 Neighbor Discovery, and usually there are no troubles with it. If there are, the root cause is most likely elsewhere. Switching off DAD to make the problem "go away" is system administration with success by coincidence. ~ Later kernels of the 3.x series support Optimistic DAD. You may have to configure and build your own kernel from the sources (the distro supported sources will probably do). – countermode May 04 '15 at 21:39
  • @countermode Attempting to bind a socket to an IP address while DAD is in progress will fail. That leads to real problems which in some cases prevent services from coming up during boot. Disabling DAD is a reliable fix to those problems. – kasperd May 04 '15 at 21:52
  • @countermode Of course DAD is not a security mechanism. If you have a range of IP addresses exclusively assigned to a single server, and if you have the necessary protections in place to prevent other nodes from spoofing ND for that IP range, then DAD is no longer needed for that range of IP addresses. Hence there is no drawback to disabling DAD. – kasperd May 04 '15 at 21:55
  • @countermode Should you somehow end up in a scenario with a duplicated address, it means you messed up something other than DAD. It may be that DAD changes the symptoms, but regardless of whether DAD is enabled or not, the duplicated address would be a misconfiguration that prevented the setup from working as intended. So DAD didn't help. – kasperd May 04 '15 at 21:57
  • @countermode In other words: There are scenarios where DAD cause problems. And there are scenarios where DAD does not solve any problems. If you know enough about your own deployment to realize that both apply, then switching off DAD is the only sensible thing to do. – kasperd May 04 '15 at 21:58
  • @countermode Building your own kernel rather than the one provided by the distribution will result in a system which is harder to maintain and more difficult to keep up to date with the latest security fixes. Using a custom build kernel when it can be avoided is simply not a wise choice by a system administrator. – kasperd May 04 '15 at 22:01
  • @countermode I have run systems with DAD disabled because of the problems DAD caused, and I have not experienced a single problem related to duplicate addresses from the moment I turned DAD off. I have run systems where I used custom build kernels (out of necessity due to driver bugs in the distribution kernel), and it has always been a pain to maintain such a system. So suggesting to use a custom build kernel instead of just turning DAD off sounds like very very bad advice. – kasperd May 04 '15 at 22:07
  • I simply cannot see what the practical problem is. DAD happens at interface initialization (or on prefix change). The delay caused by DAD is usually negligible. If it is not, then you have another problem which you should track. It seems you need an IP address "instantly" as soon as the interface comes up - God knows why. Maybe switching off DAD is the solution, but then what about NUD? Router Advertisements? etc.? You can switch off the entire Neighbor Discovery for the same reasons, so you end up configuring static IP and MAC addresses (e.g. see http://tinyurl.com/kttp8ad). – countermode May 05 '15 at 11:58
  • *You* asked me on how to enable Optimistic DAD - I never advised you to build a kernel, I just told you one way to get ODAD working. – countermode May 05 '15 at 12:00
  • @countermode To me it sounded like you were suggesting to use RFC 4429 as a means to overcome the problems caused by DAD. If that's not the case, then how would you suggest problems caused by DAD be resolved? – kasperd May 05 '15 at 12:02
  • For this to answer meaningfully I need a much deeper insight into your infrastructure. In any case this is beyond of what the comments on Stack Exchange support. I would try to set up the interface manually and capture and inspect the traffic and see how it goes; pay special attention to the timing of the Router Advertisements - I guess the source of the trouble lies somewhere there, for DAD kicks in *after* the RA being sent. – countermode May 05 '15 at 12:38
  • @countermode The setup in my case was really simple. On the link there is only a router configured with `prefix::1` and a single host, which can use all other addresses inside the prefix. Code running on the host need to be able to dynamically assign a new IP address to the interface and immediately bind a socket to that IP. With DAD enabled the bind system call fails, with DAD disabled it worked. A small delay would have been tolerable, but with a user waiting, every ms counts. However it was not a matter of a delay, the system call would return immediately, but with an error code. – kasperd May 05 '15 at 14:25
  • @countermode It was not only my own service, which was affected by this. Even standard services would sometimes fail to come up depending on timing of events during boot. Startup scripts would configure the IP on the interface and then start up services, which were configured to bind to that IP. The time to boot can vary slightly and if a service happened to be started before DAD had completed it would simply fail to come up. – kasperd May 05 '15 at 14:32
  • @countermode I am simply pointing out, that you are incorrect in claiming that it is always a bad idea to disable DAD. It is possible to know enough about your network to know that DAD is definitely not needed. If it was simply a matter of waiting a few ms, then it would be acceptable for most use cases, but when it means a complete failure for a service to even work, then it is not acceptable. It doesn't help that DAD completes later if the service already shut itself down or came up listening to only a subset of the IP addresses it was supposed to be listening on. – kasperd May 05 '15 at 14:39