Home > Workload Solutions > SAP > Guides > Dell Validated Design for High Availability for SAP with Red Hat Pacemaker Clusters > Solution introduction
Your infrastructure, no matter where it is located, must be in a highly available state to support your business and customers. In production environments, SAP business systems need infrastructure that supports a high level of availability to ensure that there are no missed transactions or opportunities. Ensuring this availability is the responsibility of the IT department.
Planning and configuring a highly available SAP landscape can be a challenge. In an ideal solution, no single points of failure exist, and the loss of a single component or even a data center never affects the system. Two considerations determine the required level of protection: the loss of productivity when the SAP system is unavailable and the investments that are necessary to prevent this loss of productivity
This solution provides best practices for attaining high availability (HA) for local hardware failures and data center failures, including:
Customers can access an additional layer of availability on top of what SAP offers for SAP HANA system replication (HSR) by using the resources that are already in the data center. SAP HANA systems using HSR can experience a “split brain” condition if a node fails on the primary SAP HANA host before a failover is fully initiated. Managing this process through cluster software can automate system replication if there is a failure. By selecting an active/active read-enabled SAP HANA configuration, you avoid having to "cold start" your SAP HANA standby database. This configuration reduces time lost to tables reloading into memory that might cause application delays.
The Dell and Red Hat Pacemaker cluster solution addresses this issue by using a fencing mechanism. Fencing ensures that only one cluster node has access to a resource at a time. This restriction is mandatory within clusters where one system could take up the role of another one to ensure the data integrity of the overall system. If a standby node suddenly cannot communicate with the primary node, it terminates the affected node to ensure that both nodes are not active simultaneously. The failover to the surviving node continues, with workloads taking over the role as the new primary node. For information about how fencing is implemented in this solution, see Configure fencing on the cluster.