pcs cluster not moving resource when instructed to move

Question

for whatever reason I'm no more able to move resources with pcs

pacemaker-1.1.16-12.el7_4.8.x86_64
corosync-2.4.0-9.el7_4.2.x86_64
pcs-0.9.158-6.el7.centos.1.x86_64
Linux server_a.test.local 3.10.0-693.el7.x86_64

I have 4 resources configured as part of resource group. Here's the log from the action when I tried to move the ClusterIP resource from server_d to server_a using pcs resource move ClusterIP servr_a.test.local

Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_delete operation for section constraints to all (origin=local/crm_resource/3)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.24.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.25.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       -- /cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP']
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=25
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by deletion of rsc_location[@id='cli-prefer-ClusterIP']: Configuration change | cib=0.25.0 source=te_update_diff:456 path=/cib/configuration/constraints/rsc_location[@id='cli-prefer-ClusterIP'] complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_delete operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/3, version=0.25.0)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:26 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 8, saving inputs in /var/lib/pacemaker/pengine/pe-input-18.bz2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Forwarding cib_modify operation for section constraints to all (origin=local/crm_resource/4)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: --- 0.25.0 2
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       Diff: +++ 0.26.0 (null)
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       +  /cib:  @epoch=26
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_perform_op:       ++ /cib/configuration/constraints:  <rsc_location id="cli-prefer-ClusterIP" rsc="ClusterIP" role="Started" node="server_a.test.local" score="INFINITY"/>
Apr 06 12:16:26 [17287] server_d.test.local        cib:     info: cib_process_request:  Completed cib_modify operation for section constraints: OK (rc=0, origin=server_d.test.local/crm_resource/4, version=0.26.0)
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: abort_transition_graph:       Transition aborted by rsc_location.cli-prefer-ClusterIP 'create': Configuration change | cib=0.26.0 source=te_update_diff:456 path=/cib/configuration/constraints complete=true
Apr 06 12:16:26 [17292] server_d.test.local       crmd:     info: handle_response:      pe_calc calculation pe_calc-dc-1523016986-67 is obsolete
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_a.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: determine_online_status:      Node server_d.test.local is online
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 1 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: unpack_node_loop:     Node 2 is already processed
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: group_print:   Resource Group: my_app
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      Apache     (systemd:httpd):        Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      stunnel    (systemd:stunnel-my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: common_print:      my_app-daemon        (systemd:my_app): Started server_d.test.local
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   ClusterIP       (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   Apache  (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   stunnel (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:     info: LogActions:   Leave   my_app-daemon     (Started server_d.test.local)
Apr 06 12:16:27 [17291] server_d.test.local    pengine:   notice: process_pe_message:   Calculated transition 9, saving inputs in /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_state_transition:  State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_te_invoke: Processing graph 9 (ref=pe_calc-dc-1523016987-68) derived from /var/lib/pacemaker/pengine/pe-input-19.bz2
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: run_graph:    Transition 9 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-19.bz2): Complete
Apr 06 12:16:27 [17292] server_d.test.local       crmd:     info: do_log:       Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Apr 06 12:16:27 [17292] server_d.test.local       crmd:   notice: do_state_transition:  State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-34.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.25.0 of the CIB to disk (digest: 7511cba55b6c2f2f481a51d5585b8d36)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.tPIv7m (digest: /var/lib/pacemaker/cib/cib.OwHiKz)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_backup:      Archived previous version as /var/lib/pacemaker/cib/cib-35.raw
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Wrote version 0.26.0 of the CIB to disk (digest: 7f962ed676a49e84410eee2ee04bae8c)
Apr 06 12:16:27 [17287] server_d.test.local        cib:     info: cib_file_write_with_digest:   Reading cluster configuration file /var/lib/pacemaker/cib/cib.MnRP4u (digest: /var/lib/pacemaker/cib/cib.B5sWNH)
Apr 06 12:16:31 [17287] server_d.test.local        cib:     info: cib_process_ping:     Reporting our current digest to server_d.test.local: 8182592cb4922cbf007158ab0a277190 for 0.26.0 (0x5575234afde0 0)

Important to notethat if I execute pcs cluster stop server_b.test.local all resources inside the configure group are moved to the other node.

What's going on? Like I said it worked and no changes have been made since then.

Thank you in advance!

EDIT:

pcs config

[root@server_a ~]# pcs config
Cluster Name: my_app_cluster
Corosync Nodes:
 server_a.test.local server_d.test.local
Pacemaker Nodes:
 server_a.test.local server_d.test.local

Resources:
 Group: my_app
  Resource: ClusterIP (class=ocf provider=heartbeat type=IPaddr2)
   Attributes: cidr_netmask=24 ip=10.116.63.49
   Operations: monitor interval=10s timeout=20s (ClusterIP-monitor-interval-10s)
               start interval=0s timeout=20s (ClusterIP-start-interval-0s)
               stop interval=0s timeout=20s (ClusterIP-stop-interval-0s)
  Resource: Apache (class=systemd type=httpd)
   Operations: monitor interval=60 timeout=100 (Apache-monitor-interval-60)
               start interval=0s timeout=100 (Apache-start-interval-0s)
               stop interval=0s timeout=100 (Apache-stop-interval-0s)
  Resource: stunnel (class=systemd type=stunnel-my_app)
   Operations: monitor interval=60 timeout=100 (stunnel-monitor-interval-60)
               start interval=0s timeout=100 (stunnel-start-interval-0s)
               stop interval=0s timeout=100 (stunnel-stop-interval-0s)
  Resource: my_app-daemon (class=systemd type=my_app)
   Operations: monitor interval=60 timeout=100 (my_app-daemon-monitor-interval-60)
               start interval=0s timeout=100 (my_app-daemon-start-interval-0s)
               stop interval=0s timeout=100 (my_app-daemon-stop-interval-0s)

Stonith Devices:
Fencing Levels:

Location Constraints:
  Resource: Apache
    Enabled on: server_d.test.local (score:INFINITY) (role: Started) (id:cli-prefer-Apache)
  Resource: ClusterIP
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-ClusterIP)
  Resource: my_app-daemon
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-my_app-daemon)
  Resource: stunnel
    Enabled on: server_a.test.local (score:INFINITY) (role: Started) (id:cli-prefer-stunnel)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:

Alerts:
 No alerts defined

Resources Defaults:
 No defaults set
Operations Defaults:
 No defaults set

Cluster Properties:
 cluster-infrastructure: corosync
 cluster-name: my_app_cluster
 dc-version: 1.1.16-12.el7_4.8-94ff4df
 have-watchdog: false
 stonith-enabled: false

Quorum:
  Options:

EDIT2

When I run crm_simulate -sL I get the following output:

[root@server_a ~]# crm_simulate -sL

    Current cluster status:
    Online: [ server_a.test.local server_d.test.local ]

     Resource Group: my_app
         ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
         Apache     (systemd:httpd):        Started server_a.test.local
         stunnel    (systemd:stunnel-my_app): Started server_a.test.local
         my_app-daemon        (systemd:my_app): Started server_a.test.local

    Allocation scores:
    group_color: my_app allocation score on server_a.test.local: 0
    group_color: my_app allocation score on server_d.test.local: 0
    group_color: ClusterIP allocation score on server_a.test.local: 0
    group_color: ClusterIP allocation score on server_d.test.local: INFINITY
    group_color: Apache allocation score on server_a.test.local: 0
    group_color: Apache allocation score on server_d.test.local: INFINITY
    group_color: stunnel allocation score on server_a.test.local: INFINITY
    group_color: stunnel allocation score on server_d.test.local: 0
    group_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    group_color: my_app-daemon allocation score on server_d.test.local: 0
    native_color: ClusterIP allocation score on server_a.test.local: INFINITY
    native_color: ClusterIP allocation score on server_d.test.local: INFINITY
    native_color: Apache allocation score on server_a.test.local: INFINITY
    native_color: Apache allocation score on server_d.test.local: -INFINITY
    native_color: stunnel allocation score on server_a.test.local: INFINITY
    native_color: stunnel allocation score on server_d.test.local: -INFINITY
    native_color: my_app-daemon allocation score on server_a.test.local: INFINITY
    native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

    Transition Summary:

Next I deleted all resources and added them back (again as before - I have it documented) and when running the command crm_simulate -sL I get now different results:

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_a.test.local
     Apache     (systemd:httpd):        Started server_a.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_a.test.local
     my_app-daemon        (systemd:my_app.service): Started server_a.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: 0
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: 0
native_color: Apache allocation score on server_a.test.local: 0
native_color: Apache allocation score on server_d.test.local: -INFINITY
native_color: stunnel allocation score on server_a.test.local: 0
native_color: stunnel allocation score on server_d.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_a.test.local: 0
native_color: my_app-daemon allocation score on server_d.test.local: -INFINITY

And I'm able to move resources but when I do and run the crm_simulate -sL command again it get a different output than before!

[root@server_a ~]# crm_simulate -sL

Current cluster status:
Online: [ server_a.test.local server_d.test.local ]

 Resource Group: my_app
     ClusterIP  (ocf::heartbeat:IPaddr2):       Started server_d.test.local
     Apache     (systemd:httpd):        Started server_d.test.local
     stunnel    (systemd:stunnel-my_app.service): Started server_d.test.local
     my_app-daemon        (systemd:my_app.service): Started server_d.test.local

Allocation scores:
group_color: my_app allocation score on server_a.test.local: 0
group_color: my_app allocation score on server_d.test.local: 0
group_color: ClusterIP allocation score on server_a.test.local: 0
group_color: ClusterIP allocation score on server_d.test.local: INFINITY
group_color: Apache allocation score on server_a.test.local: 0
group_color: Apache allocation score on server_d.test.local: 0
group_color: stunnel allocation score on server_a.test.local: 0
group_color: stunnel allocation score on server_d.test.local: 0
group_color: my_app-daemon allocation score on server_a.test.local: 0
group_color: my_app-daemon allocation score on server_d.test.local: 0
native_color: ClusterIP allocation score on server_a.test.local: 0
native_color: ClusterIP allocation score on server_d.test.local: INFINITY
native_color: Apache allocation score on server_a.test.local: -INFINITY
native_color: Apache allocation score on server_d.test.local: 0
native_color: stunnel allocation score on server_a.test.local: -INFINITY
native_color: stunnel allocation score on server_d.test.local: 0
native_color: my_app-daemon allocation score on server_a.test.local: -INFINITY
native_color: my_app-daemon allocation score on server_d.test.local: 0

Transition Summary:

I'm kinda confused :/ Is it an expected behaviour ?

@MattKereczman Hello Matt. I've edited my question with the configuration output. Thank you in advance! — yesOrMaybeWhatever, Apr 10 '18 at 08:45

score 1 · Answer 1 · answered Apr 19 '18 at 08:12

not sure I got your last answer right but I took a closer look at man pcs and found this:

move [destination node] [--master] [lifetime=] [--wait[=n]] Move the resource off the node it is currently running on by creating a -INFINITY location constraint to ban the node. If destination node is specified the resource will be moved to that node by creating an INFINITY location constraint to prefer the destination node. If --master is used the scope of the command is limited to the master role and you must use the master id (instead of the resource id). If lifetime is specified then the constraint will expire after that time, otherwise it defaults to infinity and the constraint can be cleared manually with 'pcs resource clear' or 'pcs constraint delete'. If --wait is specified, pcs will wait up to 'n' seconds for the resource to move and then return 0 on success or 1 on error. If 'n' is not specified it defaults to 60 minutes. If you want the resource to preferably avoid running on some nodes but be able to failover to them use 'pcs location avoids'.

Using pcs resource clear cleared the constraint and I could move the resources.

score 0 · Answer 2 · answered Apr 10 '18 at 22:18

0

The score:INFINITY preference constraints on all your grouped resources are likely the issue. INFINITY is actually equal to 1,000,000 in Pacemaker, which is the highest value that can be assigned to a score.

The following is true when working with INFINITY (from the ClusterLabs documentation):

6.1.1. Infinity Math 
  Pacemaker implements INFINITY (or equivalently, +INFINITY) 
  internally as a score of 1,000,000. Addition and subtraction 
  with it follow these three basic rules: 

  Any value + INFINITY =  INFINITY 
  Any value - INFINITY = -INFINITY 
  INFINITY  - INFINITY = -INFINITY

Try changing your preference scores to something like 1,000, or 10,000 rather than INFINITY, and run your test again.

answered Apr 10 '18 at 22:18

Matt Kereczman

669
3
7

Matt, please see the EDIT2 in my question. – yesOrMaybeWhatever Apr 11 '18 at 12:51
@yesOrMaybeWhatever: Are you unmoving the resources after you move them? Moving the resources places location constraints into the cluster to make the resources move. You need to `unmove` them to remove those constraints after the resources have moved if you expect them to be able to failover/failback to the peer. – Matt Kereczman Apr 11 '18 at 14:27
Hello Matt, please apologize for replaying so late (had a nasty infection). Are you referring to commands in the `Move resources` section [move resource](https://github.com/ClusterLabs/pacemaker/blob/master/doc/pcs-crmsh-quick-ref.md#move-resources) If yes, then how do I need to run the resource and on which node ? I mean, because all 4 resources are in a group I tell for example to move the ClusterIP but of course it then moves all 4 resources. Do I then need to run pcs resource clear for each of the 4 resources on the host that got the resources assigned due to the move operation ? Thanks! – yesOrMaybeWhatever Apr 16 '18 at 07:13
The four nodes in your group appear to have preference scores of `INFINITY`; those need to be removed, or something lower. `pcs constraint remove cli-prefer-Apache cli-prefer-ClusterIP cli-prefer-my_app-daemon cli-prefer-stunnel`. Then you should be able to `move` your resources. – Matt Kereczman Apr 17 '18 at 14:21

pcs cluster not moving resource when instructed to move

2 Answers2