Debian 10 Pacemaker-Cluster: GFS2 Mount fails because of "Global lock failed: check that global lockspace is started."

Question

I'm trying to setup a new Debian 10 cluster with three instances. My stack is based on pacemaker, corosync, dlm, and lvmlockd with a GFS2 volume. All servers have access to the GFS2 volume but I can't mount it with pacemaker or manually when using the GFS2 filesystem. I configured corosync and all three instances are online. I continued with dlm and lvm configuration. Here my configuration steps for LVM and pacemaker:

LVM:
sudo nano /etc/lvm/lvm.conf --> Set locking_type = 1 and use_lvmlockd = 1   

Pacemaker Resources:
sudo pcs -f stonith_cfg stonith create meatware meatware hostlist="firmwaredroid-swarm-1 firmwaredroid-swarm-2 firmwaredroid-swarm-3" op monitor interval=60s
sudo pcs resource create dlm ocf:pacemaker:controld \
    op start timeout=90s interval=0 \
    op stop timeout=100s interval=0
sudo pcs resource create lvmlockd ocf:heartbeat:lvmlockd \
    op start timeout=90s interval=0 \
    op stop timeout=100s interval=0
sudo pcs resource group add base-group dlm lvmlockd
sudo pcs resource clone base-group \
    meta interleave=true ordered=true target-role=Started

The pcs status shows that all resources are up and online. After the pacemaker configuration I tried to setup a shared Volume Group to add the Filesystem resource to pacemaker but all the commands fail with Global lock failed: check that global lockspace is started.

sudo pvcreate /dev/vdb
--> Global lock failed: check that global lockspace is started
sudo vgcreate vgGFS2 /dev/vdb —shared
--> Global lock failed: check that global lockspace is started

I then tried to directly format the /dev/vdb with mkfs.gfs2 which works but seems to me a step in the wrong direction, because mounting the volume then always fails:

sudo mkfs.gfs2 -p lock_dlm -t firmwaredroidcluster:gfsvolfs -j 3 /dev/gfs2share/lvGfs2Share
sudo mount -v -t "gfs2" /dev/vdb ./swarm_file_mount/
mount: /home/debian/swarm_file_mount: mount(2) system call failed: Transport endpoint is not connected.

I tried several configurations like starting lvmlockd -g dlm or debugging dlm with dlm_controld -d but I don't find any infos on how to do it. On the web I found some RedHat forums that discuss similar errors but do not provide any solutions due to a paywall.

How can I start or initialise the global lock with dlm so that I can mount the GFS2 correctly on the pacemaker Debian cluster? Or in other words what's wrong with my dlm configuration?

Thx for any help!

score 0 · Answer 1 · edited Jul 12 '21 at 15:37

I'm currently trying the same thing.

You need to activate the LVM volume before you can use it on a machine. You can do that with a ocf:heartbeat:LVM-activate resource. If it is activated you can see an a under Attr in the output of lvs.

I found out that the LVM-activate resource in Buster has some bugs (bashisms and looking for a deprecated LVM option). But installing the newer package form buster-backports worked so far.

Here are my current resources.

      <clone id="base-services-clone">
        <group id="base-services">
          <primitive class="ocf" id="dlm" provider="pacemaker" type="controld">
            <operations>
              <op id="dlm-monitor-interval-60s" interval="60s" name="monitor"/>
              <op id="dlm-start-interval-0s" interval="0s" name="start" timeout="90s"/>
              <op id="dlm-stop-interval-0s" interval="0s" name="stop" timeout="100s"/>
            </operations>
          </primitive>
          <primitive class="ocf" id="lvmlockd" provider="heartbeat" type="lvmlockd">
            <operations>
              <op id="lvmlockd-monitor-interval-60s" interval="60s" name="monitor"/>
              <op id="lvmlockd-start-interval-0s" interval="0s" name="start" timeout="90s"/>
              <op id="lvmlockd-stop-interval-0s" interval="0s" name="stop" timeout="90s"/>
            </operations>
          </primitive>
          <primitive class="ocf" id="cluster-vg" provider="heartbeat" type="LVM-activate">
            <instance_attributes id="cluster-vg-instance_attributes">
              <nvpair id="cluster-vg-instance_attributes-activation_mode" name="activation_mode" value="shared"/>
              <nvpair id="cluster-vg-instance_attributes-lvname" name="lvname" value="data"/>
              <nvpair id="cluster-vg-instance_attributes-vg_access_mode" name="vg_access_mode" value="lvmlockd"/>
              <nvpair id="cluster-vg-instance_attributes-vgname" name="vgname" value="cluster"/>
            </instance_attributes>
            <operations>
              <op id="cluster-vg-monitor-interval-30s" interval="30s" name="monitor" timeout="90s"/>
              <op id="cluster-vg-start-interval-0s" interval="0s" name="start" timeout="90s"/>
              <op id="cluster-vg-stop-interval-0s" interval="0s" name="stop" timeout="90s"/>
            </operations>
          </primitive>
          <primitive class="ocf" id="shared-data" provider="heartbeat" type="Filesystem">
            <instance_attributes id="shared-data-instance_attributes">
              <nvpair id="shared-data-instance_attributes-device" name="device" value="/dev/cluster/data"/>
              <nvpair id="shared-data-instance_attributes-directory" name="directory" value="/mnt/data"/>
              <nvpair id="shared-data-instance_attributes-fstype" name="fstype" value="gfs2"/>
              <nvpair id="shared-data-instance_attributes-options" name="options" value="noatime"/>
            </instance_attributes>
            <operations>
              <op id="shared-data-monitor-interval-10s" interval="10s" name="monitor"/>
              <op id="shared-data-start-interval-0s" interval="0s" name="start" timeout="60s"/>
              <op id="shared-data-stop-interval-0s" interval="0s" name="stop" timeout="60s"/>
            </operations>
          </primitive>
        </group>
      </clone>

This is not a production setup, just some test inside some vagrant VMs.

Debian 10 Pacemaker-Cluster: GFS2 Mount fails because of "Global lock failed: check that global lockspace is started."

1 Answers1