Pacemaker: Primary node is rebooted and comes back is primary instead of standby

Question

We are using pacemaker, corosync to automate failovers. We noticed one behaviour- when primary node is rebooted, the standby node takes over as primary - which is fine. When the node comes back online and services are started on it, it takes back the role of Primary. It should ideally start as standby. Are we missing any configuration?

pcs resource defaults O/p: resource-stickiness: INFINITY migration-threshold: 0

Stickiness is set to INFINITY. Please suggest.

Adding Config details:

[root@Node1 heartbeat]# pcs config show –l
Cluster Name: cluster1
Corosync Nodes:
 Node1 Node2
Pacemaker Nodes:
 Node1 Node2

Resources:
 Master: msPostgresql
  Meta Attrs: master-node-max=1 clone-max=2 notify=true master-max=1 clone-node-max=1
  Resource: pgsql (class=ocf provider=heartbeat type=pgsql)
   Attributes: master_ip=10.70.10.1 node_list="Node1 Node2" pgctl=/usr/pgsql-9.6/bin/pg_ctl pgdata=/var/lib/pgsql/9.6/data/ primary_conninfo_opt="keepalives_idle=60 keepalives_interval=5 keepalives_count=5" psql=/usr/pgsql-9.6/bin/psql rep_mode=async restart_on_promote=true restore_command="cp /var/lib/pgsql/9.6/data/archivedir/%f %p"
   Meta Attrs: failure-timeout=60
   Operations: demote interval=0s on-fail=stop timeout=60s (pgsql-demote-interval-0s)
               methods interval=0s timeout=5s (pgsql-methods-interval-0s)
               monitor interval=4s on-fail=restart timeout=60s (pgsql-monitor-interval-4s)
               monitor interval=3s on-fail=restart role=Master timeout=60s (pgsql-monitor-interval-3s)
               notify interval=0s timeout=60s (pgsql-notify-interval-0s)
               promote interval=0s on-fail=restart timeout=60s (pgsql-promote-interval-0s)
               start interval=0s on-fail=restart timeout=60s (pgsql-start-interval-0s)
               stop interval=0s on-fail=block timeout=60s (pgsql-stop-interval-0s)
 Group: master-group
  Resource: vip-master (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.70.10.2
   Operations: monitor interval=10s on-fail=restart timeout=60s (vip-master-monitor-interval-10s)
               start interval=0s on-fail=restart timeout=60s (vip-master-start-interval-0s)
               stop interval=0s on-fail=block timeout=60s (vip-master-stop-interval-0s)
  Resource: vip-rep (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.70.10.1
   Meta Attrs: migration-threshold=0
   Operations: monitor interval=10s on-fail=restart timeout=60s (vip-rep-monitor-interval-10s)
               start interval=0s on-fail=stop timeout=60s (vip-rep-start-interval-0s)
               stop interval=0s on-fail=ignore timeout=60s (vip-rep-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
Ordering Constraints:
  promote msPostgresql then start master-group (score:INFINITY) (non-symmetrical)
  demote msPostgresql then stop master-group (score:0) (non-symmetrical)
Colocation Constraints:
  master-group with msPostgresql (score:INFINITY) (rsc-role:Started) (with-rsc-role:Master)
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 resource-stickiness: INFINITY
 migration-threshold: 0
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: cluster1
 cluster-recheck-interval: 60
 dc-version: 1.1.19-8.el7-c3c624ea3d
 have-watchdog: false
 no-quorum-policy: ignore
 start-failure-is-fatal: false
 stonith-enabled: false
Node Attributes:
 Node1: pgsql-data-status=STREAMING|ASYNC
 Node2: pgsql-data-status=LATEST

Quorum:
  Options:

Thanks !

Is it possible that Postgres is set to start at boot? Could you add the output of: `systemctl status postgresql.service` — Matt Kereczman, Sep 13 '19 at 18:06
Hello Matt, checked status, postgres service is enabled at boot. Disabled this using "systemctl disable postgresql-9.6 " and enabled pacemaker and corosync services. This starts the service correctly. That is postgres is now started as standby after reboot as desired. Will this be a good approach?: Enabling pacemaker and corosync services at boot and disabling postgres services so that pacemaker starts it — User2019, Sep 16 '19 at 04:32
[root@node1 log]# systemctl status postgresql-9.6 ? postgresql-9.6.service - PostgreSQL 9.6 database server Loaded: loaded (/usr/lib/systemd/system/postgresql-9.6.service; disabled; vendor preset: disabled) Active: inactive (dead) Docs: https://www.postgresql.org/docs/9.6/static/ [root@node1 log]# -bash-4.2$ ./pg_ctl -D /var/lib/pgsql/9.6/data/ status pg_ctl: server is running (PID: 7288) /usr/pgsql-9.6/bin/postgres "-D" "/var/lib/pgsql/9.6/data" "-c" "config_file=/var/lib/pgsql/9.6/data//postgresql.conf" -bash-4.2$ — User2019, Sep 16 '19 at 11:25

Matt Kereczman · Answer 1 · 2019-09-16T18:18:31.610

You have Postgresql set to start at boot. That means that when the primary node is rebooted, it will rejoin the cluster with Postgresql already running, causing the cluster to have to perform a recovery of services (stop/start on Postgresql) since the cluster should only have one instance of the Postgresql Master running. The cluster stops Postgresql everywhere and then chooses one node - in your case the original primary - where it can start a single instance of Postgresql Master and starts it there.

To fix this make sure you have postgresql disabled and Pacemaker/Corosync enabled at boot:

# systemctl disable postgresql-9.6
# systemctl enable pacemaker corosync

Pacemaker: Primary node is rebooted and comes back is primary instead of standby

Adding Config details:

1 Answers1