Tag: High Availability

  • Red Hat High Availability Clustering: A Technical Guide to Fault Tolerance & Data Consistency


    When critical workloads can’t afford downtime, Red Hat High Availability Clusters step in to keep services running, ensure data stays consistent, and eliminate single points of failure. Built on the solid foundation of the High Availability Add-On, these clusters use a mix of resource orchestration, fault detection, and fencing mechanisms to deliver enterprise-grade uptime.

    Whether you’re a Linux engineer, system architect, or platform owner evaluating RHEL clustering, this deep dive walks you through its architecture, components, and strategies for maintaining availability and integrity.


    🔧 What Makes a Cluster “Highly Available”?

    At the heart of RHEL HA is the High Availability Add-On, which transforms a group of RHEL systems (called nodes) into a cohesive cluster. This cluster continuously monitors each member, takes over services when failures occur, and ensures clients never know something went wrong.

    Clusters built using this RH-HA:

    • Avoid single points of failure
    • Automatically failover services
    • Maintain data integrity during transitions

    Key tools in the stack include:

    • Pacemaker: The brain of the cluster that manages resources
    • Corosync: Handles messaging, quorum, and membership
    • STONITH (Fencing): Ensures failed nodes are completely cut off
    • GFS2 and lvmlockd: Enable active-active shared storage access

    🧠 Core Components of RHEL High Availability

    1. Pacemaker: Resource Management Engine

    Pacemaker is the cluster’s resource orchestrator, comprising several daemons:

    • CIB: Holds configuration/status in XML, synced across all nodes
    • CRMd: Schedules actions like start/stop/move for resources
    • LRMd: Interfaces with local agents to execute actions and monitor state

    2. Corosync: Messaging Backbone

    Corosync ensures all nodes talk to each other reliably. It manages:

    • Membership and quorum determination
    • Messaging and state sync via kronosnet
    • Redundant links and failover networking

    3. Fencing (STONITH): Last Line of Defense

    If a node stops responding, how do you guarantee it won’t corrupt data? Enter fencing.

    • STONITH (“Shoot The Other Node In The Head”) cuts power or access to failed nodes
    • Prevents dual writes and split-brain scenarios
    • Required (stonith-enabled=true) for production clusters

    Examples:

    • Redundant power fencing ensures both power supplies of a node are killed
    • Use fencing delays (pcmk_delay_base, priority-fencing-delay) to avoid race conditions

    🧩 Ensuring Quorum and Preventing Split-Brain

    A cluster needs quorum (majority vote) to make decisions. Without it, Pacemaker halts all resources to protect data.

    • votequorum service tracks voting nodes
    • no-quorum-policy:
    • stop (default): Stops all services
    • freeze: Useful for GFS2 where shutdowns require quorum
    • Quorum devices (net-based) help even-node clusters survive more failures
    • Algorithms: ffsplit, lms

    💾 Storage Strategies for Data Consistency

    1. Shared Storage

    Failover only works if the new node can access the same data. Supported mediums include:

    • iSCSI
    • Fibre Channel
    • Shared block devices

    2. LVM in Clusters

    • HA-LVM: Active/passive, single-node access at a time
    • lvmlockd: Enables active/active access, works with GFS2

    3. GFS2: The Cluster File System

    • Allows simultaneous block-level access from multiple nodes
    • Requires Pacemaker, Corosync, DLM, and lvmlockd
    • Supports encrypted file systems (RHEL 8.4+)

    ⚙️ Resource Management Tactics

    Resources in Pacemaker are abstracted via agents. They can be grouped, ordered, colocated, and monitored with high precision.

    Key controls:

    • Groups: Start in order, stop in reverse
    • Constraints:
      • Location (where)
      • Ordering (when)
      • Colocation (with whom)
    • Health checks: Automatic monitoring with customizable failure policies
    • migration-threshold: Move resource after N failures
    • start-failure-is-fatal: Node marked bad after failed start
    • multiple-active: What to do if resource runs on >1 node
    • shutdown-lock: Prevents unnecessary failovers during planned maintenance

    🌐 Multi-Site Clustering & Remote Nodes

    1. Booth Ticket Manager

    Manages split-brain in geo-distributed clusters. Tickets control which site holds resource ownership.

    2. pacemaker_remote

    Lets you add nodes that don’t run Corosync (e.g., VMs) into your cluster:

    • Extend cluster size beyond 32 nodes
    • Useful for managing cloud VMs or containers

    🛠️ Configuration Tools

    Red Hat provides two main tools to manage the cluster:

    • pcs (CLI)
    • pcsd (Web UI)

    Tasks made simple:

    • Cluster creation
    • Adding/removing nodes
    • Config changes (live)
    • Viewing status and logs

    ✅ Summary: Why RHEL HA Matters

    If your workloads can’t go down—and your data can’t risk corruption—RHEL HA offers:

    • Mature, enterprise-tested components
    • Consistent handling of failovers and fencing
    • Flexibility for active/active and geo-distributed clusters
    • Integrated tooling for automation and visibility

    Start with two nodes. Plan your fencing. Decide quorum policies. Add shared storage. Then scale.

    When uptime matters, RHEL High Availability Add-On delivers.


    Have questions or want a deeper walkthrough? Contact us at OmOps or explore more Linux and infrastructure insights on our blog.

  • Understanding Pacemaker clusters and their resources configuration for high availability

    Pacemaker clusters also known as RHEL HA and its resources are configured and managed to provide reliability, scalability, and availability to critical production services. This is done by eliminating single points of failure and facilitating failover. The Red Hat High Availability uses Pacemaker as its cluster resource manager.

    Before jumping into how Pacemaker cluster and resources are configured, lets dive in some core concepts and Understand RHHA

    Core Concepts and Components

    Cluster Definition: A cluster consists of two or more computers, known as nodes or members. High availability clusters ensure service availability by moving services from an inoperative node to another cluster node.

    Pacemaker’s Role: Pacemaker is the cluster resource manager that ensures maximum availability for cluster services and resources. This is done by using cluster infrastructure’s messaging and membership capabilities to detect and recover from node and resource-level failures.

    Key Components:

    Cluster Infrastructure: Provides functions such as configuration file management, membership management, lock management, and fencing.

    High Availability Service Management: Manages failover of services when a node becomes inoperative. This is primarily handled by Pacemaker.

    Cluster Administration Tools: Configuration and management capabilities for setting up, configuring, and managing the HA tools, including infrastructure, service management, and storage components.

    Cluster Information Base (CIB): An XML-based daemon that distributes and synchronises current configuration and status information across all cluster nodes from a Designated Coordinator (DC). It represents both the cluster’s configuration and the current state of all resources. Direct editing of the cib.xml file is not recommended; instead, use pcs or pcsd.

    Cluster Resource Management Daemon (CRMd): Routes Pacemaker cluster resource actions, allowing resources to be queried, moved, instantiated, and changed.

    Local Resource Manager Daemon (LRMd): Acts as an interface between CRMd and resources, passing commands (e.g., start, stop) to agents and relaying status information.

    corosync: The daemon that provides core membership and communication needs for high availability clusters, managing quorum rules and messaging between cluster members.

    Shoot the Other Node in the Head (STONITH): Pacemaker’s fencing implementation, which acts as a cluster resource to process fence requests, forcefully shutting down nodes to ensure data integrity.

    Configuration and Management Tools

    Red Hat provides two primary tools for configuring and managing Pacemaker clusters:

    pcs (Command-Line Interface): This tool controls and configures Pacemaker and the corosync heartbeat daemon. It can perform tasks such as creating and configuring clusters, modifying running cluster configurations, and remotely managing cluster status.

    pcsd Web UI (Graphical User Interface): Offers a graphical interface for creating and configuring Pacemaker/Corosync clusters, with the same capabilities as the pcs CLI. It can be accessed via https://nodename:2224. For pcsd to function, port TCP 2224 must be open on all nodes for pcsd Web UI and node-to-node communication.

    Essential Cluster Concepts

    Fencing: If communication with a node fails, other nodes must be able to restrict or release access to shared resources. This is achieved via an external method called fencing, using a fence device (also known as a STONITH device). STONITH ensures data safety by guaranteeing a node is truly offline before shared data is accessed by another node, and forces offline nodes where services cannot be stopped. Red Hat only supports clusters with STONITH enabled (stonith-enabled=true). Fencing can be configured with multiple devices using fencing levels.

    Quorum: Cluster systems use quorum to prevent data corruption and loss by ensuring that more than half of the cluster nodes are online. Pacemaker, by default, stops all resources if quorum is lost. Quorum is established via a voting system. The votequorum service, along with fencing, prevents “split-brain” scenarios. In Split-brain parts of the cluster act independently, potentially corrupting data. For GFS2 clusters, no-quorum-policy must be set to freeze to prevent fencing and allow the cluster to wait for quorum to be regained.

    Quorum Devices: A separate quorum device can be configured to allow a cluster to sustain more node failures than standard quorum rules, especially recommended for clusters with an even number of nodes (e.g., two-node clusters).

    Cluster Resource Configuration

    A cluster resource is an instance of a program, data, or application managed by the cluster service, abstracted by agents that provide a standard interface.

    Resource Creation: Resources are created using the pcs resource create command. They can be of various classes, including OCF, LSB, systemd, and STONITH.

    Meta Options: Resource behavior is controlled via meta-options, such as priority (for resource preference), target-role (desired state), is-managed (cluster control), resource-stickiness (preference to stay on current node), requires (conditions for starting), migration-threshold (failures before migration), and multiple-active (behavior if resource is active on multiple nodes).

    Monitoring Operations: All Resources can have monitoring operations defined to ensure their health. If not specified, a default monitoring operation is added. Multiple monitoring operations can be configured with different check levels and intervals.

    Resource Groups: A common configuration involves grouping resources that need to be located together, start sequentially, and stop in reverse order. Constraints can be applied to the group as a whole.

    Constraints: Determine resource behavior within the cluster.

    Location Constraints: Define which nodes a resource can run on, allowing preferences or avoidance. They can be used to implement “opt-in” (resources don’t run anywhere by default) or “opt-out” (resources can run anywhere by default) strategies.

    Ordering Constraints: Define the sequence in which resources start and stop.

    Colocation Constraints: Determine where resources are placed relative to other resources. The influence option (RHEL 8.4+) determines whether primary resources move with dependent resources upon failure.

    Cloned Resources: Allow a resource to be active on multiple nodes simultaneously (e.g., for load balancing). Clones are slightly sticky by default, preferring to stay on their current node.

    Multistate (Master/Slave) Resources: A specialization of clones where instances can operate in two modes: Master and Slave. They also have stickiness by default.

    LVM Logical Volumes: Supported in two cluster configurations: High Availability LVM (HA-LVM) for active/passive failover (single node access) and LVM volumes using lvmlockd for active/active configurations (multiple node access). Both must be configured as cluster resources and managed by Pacemaker. For RHEL 8.5+, vgcreate --setautoactivation n is used to prevent automatic activation outside Pacemaker.

    GFS2 File Systems: Can be configured in a Pacemaker cluster, requiring the dlm (Distributed Lock Manager) and lvmlockd resources. The no-quorum-policy for GFS2 clusters should be set to freeze.

    Virtual Domains as Resources: Virtual machines managed by libvirt can be configured as cluster resources using the VirtualDomain resource type. Once configured, they should only be started, stopped, or migrated via cluster tools. Live migration is possible if allow-migrate=true is set.

    Pacemaker Remote Nodes: Allows nodes not running corosync (e.g., virtual guests, remote hosts) to integrate into the cluster and have their resources managed. Connections are secured using TLS with a pre-shared key (/etc/pacemaker/authkey).

    Pacemaker Bundles (Docker Containers): Pacemaker can launch Docker containers as bundles, encapsulating resources within them. The container image must include the pacemaker_remote daemon.

    Managing Resources and Cluster State

    Displaying Status: pcs resource status displays configured resources. pcs status --full shows detailed cluster status, including online/offline nodes and resource states.

    Clearing Failure Status: pcs resource cleanup resets a resource’s status and failcount after a failure is resolved. pcs resource refresh re-detects the current state of resources regardless of their current state.

    Moving Resources: Resources can be manually moved using pcs resource move or pcs resource relocate. Resources can also be configured to move after a set number of failures (migration-threshold) or due to connectivity changes by using a ping resource and location constraints.

    Enabling/Disabling/Banning Resources: Resources can be disabled (pcs resource disable) to manually stop them and prevent the cluster from starting them. They can be re-enabled (pcs resource enable). pcs resource ban prevents a resource from running on a specific node.

    Unmanaged Mode: Resources can be set to unmanaged mode, meaning Pacemaker will not start or stop them, while still keeping them in the configuration.

    Node Standby Mode: A node can be put into standby mode (pcs node standby) to prevent it from hosting resources, effectively moving its active resources to other nodes. This is useful for maintenance or testing.

    Cluster Maintenance Mode: The entire cluster can be put into maintenance mode (pcs property set maintenance-mode=true) to stop all services from being started or stopped by Pacemaker until the mode is exited.

    Updating Clusters: Updates to the High Availability Add-On packages can be performed via rolling updates (one node at a time) or by stopping the entire cluster, updating all nodes, and then restarting. When stopping pacemaker_remote on a remote/guest node for maintenance, disable its connection resource first to prevent monitor failures and graceful resource migration.

    Disaster Recovery Clusters: Two clusters can be configured for disaster recovery, with one as the primary and the other as the recovery site. Resources are typically run in production on the primary and in demoted mode (or not at all) on the recovery site, with data synchronisation handled by the applications themselves. pcs dr commands (RHEL 8.2+) allow displaying status of both clusters from a single node.

  • Fundamental differences and use cases for Red Hat High Availability versus Load Balancer?

    The Red Hat High Availability Add-On and Load Balancer are distinct solutions designed to address different aspects of system availability and performance, though they can also be used in conjunction to create more robust environments.

    Here are the fundamental differences and use cases for each:

    Red Hat High Availability Add-On

    The Red Hat High Availability Add-On is a clustered system primarily focused on providing reliability, scalability, and availability to critical production services by eliminating single points of failure. It achieves this mainly through failover of services from one cluster node to another if a node becomes inoperative.

    Core Components and Concepts:

    Pacemaker: This is the cluster resource manager used by the High Availability. It oversees cluster membership, manages services, and monitors resources.

    Cluster Infrastructure: Provides fundamental functions like configuration file management, membership management, lock management, and fencing, enabling nodes to work together as a cluster.

    High Availability Service Management: Facilitates the failover of services in case of node failure.

    Fencing (STONITH): This is a critical mechanism to ensure data integrity by physically isolating or “shooting” an unresponsive node, preventing it from corrupting shared data or resources. Red Hat only supports clusters with fencing enabled.

    Quorum: Cluster systems use quorum to prevent data corruption and loss, especially in “split-brain” scenarios where network communication issues could cause parts of the cluster to operate independently. A cluster has quorum when more than half of its nodes are online.

    Cluster Resources: These are instances of programs, data, or applications managed by the cluster service through “agents” that provide a standard interface. Resources can be configured with constraints (location, ordering, colocation) to determine their behavior within the cluster.

    LVM Support: It supports LVM volumes in two configurations:

    Active/Passive (HA-LVM): Only a single node accesses storage at any given time, avoiding cluster coordination overhead for increased performance. This is suitable for applications not designed for concurrent operation.

    Active/Active (LVM with lvmlockd**):** Multiple nodes require simultaneous read/write access to LVM volumes, with lvmlockd coordinating activation and changes to LVM metadata. This is used for cluster-aware applications and file systems like GFS2.

    Key Use Cases for Red Hat High Availability Add-On:

    Maintaining High Availability: Ensuring critical services like Apache HTTP servers, NFS servers, or Samba servers remain available even if a node fails, by failing them over to another healthy node.

    Data Integrity: Crucial for services that read and write data via shared file systems, ensuring data consistency during failover.

    Active/Passive Configurations: For most applications not designed for concurrent execution.

    Active/Active Configurations: For specific cluster-aware applications like GFS2 or Samba that require simultaneous access to shared storage.

    Virtual Environments: Managing virtual domains as cluster resources and individual services within them.

    Disaster Recovery: Configuring two clusters (primary and disaster recovery) where resources can be manually failed over to the recovery site if the primary fails.

    Multi-site Clusters: Using Booth cluster ticket manager to span clusters across multiple sites and manage resources based on granted tickets, ensuring resources run at only one site at a time.

    Remote Node Integration: Integrating nodes not running corosync into the cluster to manage their resources remotely, allowing for scalability beyond standard node limits.

    Load Balancer

    The Load Balancer (specifically using Keepalived and HAProxy in Red Hat Enterprise Linux 7) is designed to provide load balancing and high-availability to network services, dispatching network service requests to multiple cluster nodes to distribute the request load.

    Core Components and Concepts:

    Keepalived:

    ◦ Runs on active and passive Linux Virtual Server (LVS) routers.

    ◦ Uses Virtual Redundancy Routing Protocol (VRRP) to elect an active router and manage failover of the virtual IP address (VIP) to backup routers if the active one fails.

    ◦ Performs load balancing for real servers and health checks on service integrity.

    ◦ Operates primarily at Layer 4 (Transport layer) for TCP connections.

    ◦ Supports various scheduling algorithms (e.g., Round-Robin, Least-Connection) to distribute traffic.

    ◦ Offers persistence and firewall marks to ensure client requests consistently go to the same real server for stateful connections (e.g., multi-screen web forms, FTP).

    HAProxy:

    ◦ Offers load-balanced services for HTTP and TCP-based services.

    ◦ Processes events on thousands of connections across a pool of real servers.

    ◦ Allows defining proxy services with front-end (VIP and port) and back-end (pool of real servers) systems.

    ◦ Performs load-balancing management at Layer 7 (Application layer).

    ◦ Supports various scheduling algorithms (e.g., Round-Robin, Least-Connection, Source, URI, URL Parameter).

    Key Use Cases for Load Balancer:

    Traffic Distribution: Balancing network service requests across multiple “real servers” to optimize performance and throughput.

    Scalability: Cost-effectively scaling services by adding more real servers to handle increased load.

    High-Volume Services: Ideal for production web applications and other Internet-connected services that experience high traffic.

    Router Failover: Keepalived ensures that the Virtual IP (VIP) address and, consequently, access to the load-balanced services, remains available even if the primary load balancer router fails.

    Diverse Hardware: Weighted scheduling algorithms allow for efficient load distribution among real servers with varying capacities.

    Stateful Connections: Using persistence or firewall marks to direct a client’s subsequent connections to the same real server for applications requiring session consistency (e.g., e-commerce, FTP).

    Flexible Routing: Supports both NAT (Network Address Translation) routing and Direct Routing, offering flexibility in network topology and performance.

    Fundamental Differences

    Feature Red Hat High Availability Add-On Load Balancer
    Primary Goal Ensuring service availability through failover to eliminate single points of failure. Distributing network traffic across multiple servers to enhance scalability and performance.
    Core Mechanism Manages application and service resources and moves them between nodes upon failure. Directs client requests to multiple backend servers based on load balancing algorithms.
    Key Components Pacemaker, Corosync, Fencing (STONITH), CIB, CRMd, LRMd, GFS2, LVM. Keepalived (LVS, VRRP) and HAProxy.
    Operating Layer Application/Service layer (manages state and startup/shutdown of services). Layer 4 (TCP) and/or Layer 7 (HTTP/HTTPS).
    Data Integrity Actively ensures data integrity during failover, especially with shared storage (e.g., lvmlockd, GFS2). Does not directly manage data integrity; relies on backend servers to handle data consistency.
    Redundancy Type Primarily active/passive failover for services (one active, others standby), though active/active is supported with specific tools. Typically active/active for real servers (all serving requests) with active/passive for load balancer routers.
    Configuration Uses pcs command-line interface or pcsd Web UI to configure Pacemaker and Corosync. Configured via keepalived.conf and haproxy.cfg files.

    In summary, the High Availability Add-On focuses on maintaining uptime of a service or application by ensuring it can reliably restart or move to another server if its current host fails, with a strong emphasis on data integrity. The Load Balancer, conversely, focuses on distributing incoming client requests across multiple servers to handle higher traffic volumes and improve overall system performance, while also providing failover at the routing level. They can complement each other, with an HA cluster protecting the backend services that are being load-balanced.

  • Difference between RHEL 7 and RHEL 8 HA

    Red Hat Enterprise Linux (RHEL) 8 introduces several enhancements and changes to its High Availability (HA) Add-On compared to RHEL 7, primarily building upon and refining the Pacemaker and Corosync technologies.

    Here are the key differences:

    Tech StackRHEL 7RHEL 8
    Core HA Technologies and Storage Management
    Cluster Logical Volume Management (CLVM) vs. LVM Locking Daemon (lvmlockd)the High Availability Add-On used `clvmd` (Cluster Logical Volume Manager) for volume management of cluster storage, specifically for active/active configurations with GFS2.replaces `clvmd` with the LVM Locking Daemon (lvmlockd) for managing shared storage devices in active/active configurations, where more than one node requires simultaneous access to storage. `lvmlockd` works in conjunction with the Distributed Lock Manager (`dlm`)
    LVM Volume Activationused `lvmconf –enable-halvm` to configure HA-LVM RHEL 8.5 and later, when creating LVM volume groups managed by Pacemaker, you can use the `vgcreate –setautoactivation n` flag to prevent automatic activation on startup.
    In earlier versions, disabling auto-activation required modifying the `auto_activation_volume_list` in `/etc/lvm/lvm.conf` and rebuilding the `initramfs` boot image.
    GFS2 File Systemsalso provides a specific procedure for migrating GFS2 file systems from RHEL 7, which involves changing the volume group’s lock type from `none` to `dlm` and ensuring the RHEL 8 cluster has the same name as the RHEL 7 cluster.leverages `lvm2-lockd`, `gfs2-utils`, and `dlm` packages for GFS2 configurations. It supports encrypted GFS2 file systems using the `crypt` resource agent.
    Cluster Management and Configuration Tools
    pcs Command Enhancementspcs` commands can now export configurations for recreation on different systems, including cluster properties (`pcs property config –output-format=cmd`), fence devices (`pcs stonith config –output-format=cmd`), and cluster resources (`pcs resource config –output-format=cmd`).
    pcs resource defaults update` is the preferred command for changing global resource option defaults over the older `pcs resource defaults name=value`.
     The `pcs cluster config` command can display the `corosync.conf` file in a human-readable format and includes the Cluster UUID if created in RHEL 8.7 or later or manually added.
    Resource DisplayThe `pcs resource relations` command can display resource dependencies in a tree structure. Also, `pcs constraint list` no longer displays expired constraints by default; the `–all` option is needed to include them.

    New commands `pcs resource status resource_id` and `pcs resource status node=node_id` for displaying status of individual resources or resources on a specific node.
    Fencing Improvements
    Fencing DelaysIntroduced the priority-fencing-delay cluster property, which allows a two-node cluster to fence the node with the fewest or least important resources in a split-brain situation. This delay is additive to `pcmk_delay_base` and `pcmk_delay_max`The `pcmk_delay_base` parameter allows specifying different fencing delays for individual nodes even when using a single fence device.
    fence-reaction PropertyThe `fence-reaction` cluster property was introduced to determine how a node should react if notified of its own fencing. The default is `stop`, but `panic` (attempts immediate reboot) is considered safer.
    Concurrent Fencingconcurrent-fencing=true` became the default, allowing fencing operations to be performed in parallel.
    pcmk_host_mapThe `pcmk_host_map` property for fencing devices supports special characters in host alias values.
    Quorum and Multi-Site Clusters
    Quorum Devicesprovided full support for a separate quorum device (corosync-qnetd) to sustain more node failures, especially recommended for clusters with an even number of nodes. 
    Multi-Site Clusters (Booth)provided full support for configuring multi-site clusters using the Booth cluster ticket manager, 
    Disaster Recovery (DR) ClustersThe `pcs dr` commands allow displaying the status of both primary and disaster recovery clusters from a single node. This facilitates monitoring but does not automate resource configuration or data replication, which must be handled manually.
    Resource Behavior and Options
    shutdown-lock PropertyIntroduced the `shutdown-lock` cluster property. When set to `true`, resources on a node undergoing a clean shutdown will be locked to that node and prevented from failing over to other nodes until the node rejoins the cluster, ideal for maintenance windows.
    Safer Resource DisablingNew `pcs resource disable` options like `–simulate`, `–safe`, and `–safe –no-strict` allow administrators to assess or perform resource disabling with greater control and avoid unintended side effects.
    Resource Tagstag cluster resources using the `pcs` command. This allows enabling, disabling, managing, or unmanaging a specified set of resources with a single command.
    multiple-active optionThe `multiple-active` resource meta option gains a new value, `stop_unexpected`, which only stops unexpected active instances of a resource without requiring a full restart of all instances.
    allow-unhealthy-nodesIntroduced the `allow-unhealthy-nodes` resource meta option, which, when set to `true`, prevents a resource from being forced off a node due to degraded node health, allowing the cluster to move resources back once health recovers.
    Node Health Strategynode health strategy in Pacemaker to automatically move resources off unhealthy nodes. This works in conjunction with health resource agents like `ocf:pacemaker:HealthCPU`, `HealthIOWait`, `HealthSMART`, and `SysInfo`, which set node attributes based on system health. The strategy can be configured to `migrate-on-red`, `only-green`, `progressive`, or `custom`.
    Virtualization Integration
    pacemaker_remote DaemonBoth RHEL 7 and RHEL 8 support the `pacemaker_remote` service for integrating non-Corosync nodes (remote nodes and guest nodes) into the cluster, enabling scaling beyond the traditional node limits.

    The handling of `authkey` and the commands for adding remote/guest nodes were refined in RHEL 7.4 (`pcs cluster node add-guest` and `pcs cluster node add-remote` replaced older commands), with RHEL 8 continuing these updated commands.

    In RHEL 7.3 and later (and thus RHEL 8), if `pacemaker_remote` stops, resources are gracefully migrated off the node. In RHEL 7.2 and earlier, this would have caused fencing.
    Virtual Domain ResourcesBoth RHEL 7 and RHEL 8 support configuring `libvirt`-managed virtual domains as cluster resources. `VirtualDomain` resource options like `force_stop`, `migration_transport`, and `snapshot`. The `allow-migrate` metadata option enables live migration without state loss for VMs managed as cluster resources.

    Note that live migration for full cluster nodes (not managed as resources) is generally not supported and requires manual removal/re-addition to the cluster.