How can I determine exactly what is happening with every operation on a resource in Pacemaker?

 SOLUTION 已验证 - 已更新 2021年二月10日00:02 - 

环境

  • Red Hat Enterprise Linux (RHEL) 7, 8 with the High Availability or Resilient Storage Add On

问题

决议

  • Update each LVM resource with trace_ra=1:
# pcs resource update <resource id> trace_ra=1

NOTE: This is only recognized in pcs-0.9.157-1.el7 and later. If you have an earlier version of pcs, add the --force option.

  • Check in /var/lib/heartbeat/trace_ra/*/ for these logs to make sure they're being generated.

  • If an incident occurs which requires analysis, grab those logs from the time of the incident.

NOTE: This will create a bash -x log from every single operation on the applicable resources, which may consume disk space. Make sure this can be accomodated before proceeding.

NOTE: The trace_ra attribute will only work on the ocf:heartbeat and ocf:pacemaker resources. If the resource is not part of that standard:provider then trace_ra will not work on that resource (ex. systemd and lsb resources).

根源

The trace_ra feature is a development tool included in Pacemaker with the pcs package. It is not recommended for routine use under any conditions. The appropriate use is as a learning tool, when developing custom resource agents, or as a last resort when all other methods of gathering data about the behavior of a resource have failed.

诊断步骤

Before using this feature, it is recommended to try and solve the problem through other means:

  • Implement supplemental system utilization statistics and have them active during the time the problem is observed. Provide this data to the Red Hat support engineer for help in interpreting it.

  • Implement enhanced logging and provide a fresh sosreport with the new logging data to Red Hat support.

If these methods are not successful, then it may be appropriate to consider the trace_ra feature.

  •