2

I have two node active-passive cluster.

Clusters_from_Scratch

If a cluster splits into two (or more) groups of nodes that can no longer communicate with each other (aka. partitions), quorum is used to prevent resources from starting on more nodes than desired, which would risk data corruption. A cluster has quorum when more than half of all known nodes are online in the same partition

By the above definition, a two-node cluster would only have quorum when both nodes are running. This would make the creation of a two-node cluster pointless, but corosync has the ability to treat two-node clusters as if only one node is required for quorum. The pcs cluster setup command will automatically configure two_node: 1 in corosync.conf, so a two-node cluster will "just work".

Here's my config:

enter image description here

So how can the cluster now decide which one has quorum?

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
blabla_trace
  • 345
  • 2
  • 5
  • 19

1 Answers1

3

There is no deciding:

two_node: 1

Enables two node cluster operations (default: 0).

The "two node cluster" is a use case that requires special consideration. With a standard two node cluster, each node with a single vote, there are 2 votes in the cluster. Using the simple majority calculation (50% of the votes + 1) to calculate quorum, the quorum would be 2. This means that the both nodes would always have to be alive for the cluster to be quorate and operate.

Enabling two_node: 1, quorum is set artificially to 1.

The above is from the man page for votequorum (or available locally in section 5).

Also pertinent:

The way it works is that in the event of a network outage both nodes race in an attempt to fence each other and the first to succeed continues in the cluster. The system administrator can also associate a delay with a fencing agent so that one node can be given priority in this situation so that it always wins the race.

See also: New quorum features in Corosync 2 by Christine Caulfield.

Jeff Schaller
  • 66,199
  • 35
  • 114
  • 250
  • So what could I possibly do in terms of split-brain protection ? – blabla_trace Feb 23 '19 at 06:47
  • 1
    Have a fencing mechanism: http://clusterlabs.org/pacemaker/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08.html#_what_is_stonith – Jeff Schaller Feb 23 '19 at 10:17
  • But does fencing not take care of misbehaving / unresponsive nodes ? I mean in a two node cluster (active-passive) I don't want pacemaker to fence one of the nodes just because there are only 2 nodes. Wouldn't I need quorum first to deal with the quorum so to say problem first and later one fencing ? Just asking, as I am very new to that topic and might confuse basic terminology in first please. – blabla_trace Feb 23 '19 at 15:26
  • 1
    quorum is forced to 1 so that a 2-node cluster *can* operate. If a node becomes unresponsive, that's a job for the fencing mechanism. – Jeff Schaller Feb 23 '19 at 20:08
  • Thank you Jeff. But say the network is interrupted between the two nodes, so each of them assumes the other is gone and thus tries to take over the master role and fire up all resources, a clear split-brain, right? How can fencing make the decision which one to fence? – blabla_trace Feb 23 '19 at 21:15
  • 1
    `... the first to succeed continues in the cluster ...` – Jeff Schaller Feb 24 '19 at 00:04
  • Ok but that would cover only network outage. What if say a particular resource e.g. file system is not able to be unmounted on on node. Would fencing cover that as well, meaning fence the unresponsive node ? – blabla_trace Feb 24 '19 at 08:13