1

I'm using Debian and would like to setup a HA network with a few servers.

This is my configuration:

   -----sw1--------------sw2------------ host1
  /    /                    \    \        /
srv1 srv2                  srv3 srv4     /
  \    \                    /    /      /
   -----sw3--------------sw4------------

I have a bunch of servers (srv) connected to two independent networks. Each server has two nics. Each of the nic is connected to one of the networks using unmanaged switches (sw). Each server uses the nics for a bonding interface with one fixed IP address. The host should be able to reach all servers. It has a bonding interface, too.

Is the above scenario possible with a Linux bonding device? The host should be able to contact all servers even if one nic of a server is inactive or one of the networks is down. Cross-connecting the networks is not what I want.

I made some experiments with the bonding modes active-backup and broadcast in combination with ARP monitoring. But without success ...

With the above modes I was able to create a working network, but as soon as a link fails the network is going down.

Can you help me? Are there alternatives to reach my aim?

Thanks in advance.

Dieter

Dieter Scholz
  • 13
  • 1
  • 4

1 Answers1

3

No, that's not how bonding works.

Bonding uses protocols like LACP to aggregate multiple ethernet links between the same two pieces of equipment (2 hosts, 2 switches, or 1 switch and 1 host) and treat them as though they were a single link.

Because the two network interfaces on srv1, srv2, srv3, srv4, and host1 are connected to different ethernet interfaces, bonding/LACP does not apply.

One option you have is to bridge together both interfaces on each of the 5 hosts and have the hosts participate in the spanning tree network to ensure that there are no layer 2 loops. Unfortunately, since you mention that the switches are unmanaged, spanning tree and layer 2 loops may be dangerous in your scenario because you cannot configure the spanning tree parameters on the unmanaged switches... or perhaps they do not implement spanning tree at all. Moreover, STP resolves loops in the network by blocking ports completely, effectively leading to an active+standby configuration, which is not what you want.

Another option is to make the "top" and "bottom" networks completely independent IP networks. In other words, implement your load balancing or redundancy at layer 3 instead of layer 2.

-----+----------------+---------------+--------------
     |                |               |
     |10.0.1.1/24     |10.0.1.2/24    |10.0.1.3/24
  +------+         +------+        +------+
  | srv1 |         | srv2 |        | srv3 |           etc...
  +------+         +------+        +------+
     |10.0.2.1/24     |10.0.2.2/24    |10.0.2.3/24
     |                |               |
-----+----------------+---------------+--------------

Simple version:

For each server, publish both IP addresses in DNS, and use DNS-based load balancing to allow one link or the other to be selected.

Pro:

  • simple

Con:

  • individual TCP connections won't fail over to the other link
  • long timeouts may occur if one of the links is down before the other one is tried
  • not all applications understand how to retry a connection to a host on a different IP address

Enhanced version:

Add an extra loopback address on each host that is not on either one of the subnets for the individual links. For example, 10.0.3.x/32:

  • srv1: 10.0.3.1/32
    • Configure with ip addr add 10.0.3.1/32 dev lo
  • srv2: 10.0.3.2/32
  • srv3: 10.0.3.3/32
  • etc...

Have the servers speak a routing protocol such as OSPF between themselves. Use quagga for this purpose. Each server will learn through OSPF how to reach the loopback addresses of each of the other servers through both links (or through only one link if only one link is up). Publish only the loopback addresses in DNS.

With some careful additional configuration of kernel source IP address selection, you can probably arrange for the loopback addresses to be used as source addresses too, which will mean that individual TCP connections will seamlessly fail over.

Celada
  • 43,173
  • 5
  • 96
  • 105
  • I was afraid that using bonding will only be a hack - if it is possible at all. The enhanced version you described looks promising. I will try that. – Dieter Scholz Jan 18 '16 at 11:09