Home > Workload Solutions > SAP > Guides > Dell Validated Design for High Availability for SAP with VMware and Red Hat on Dell PowerEdge Servers > Solution introduction
The infrastructure of an SAP landscape, no matter where it is, must be in an available state to support businesses and customers. In production environments, SAP business systems need infrastructure that supports a high level of availability to ensure that there are no missed transactions or opportunities. Ensuring this availability is the responsibility of the IT department.
Planning and configuring a highly available SAP landscape can be a challenge. In an ideal solution, no single point of failure exists, and the loss of a single component or even a data center never affects the system.
Two considerations determine the required level of protection:
This solution provides best practices for attaining high availability (HA) for local hardware failures and data center failures, including:
Customers can access an additional layer of availability on top of what SAP offers for SAP HANA system replication (HSR) by using the resources that are already in the data center. SAP HANA systems using HSR can experience a “split brain” condition if a node fails on the primary SAP HANA host before a failover is fully initiated. Managing this process through cluster software can automate system replication if there is a failure. By selecting an active/active read-enabled SAP HANA configuration, you avoid having to "cold start" your SAP HANA standby database. This configuration reduces time lost to tables reloading into memory that might cause application delays.
The Dell, VMware, and Red Hat Pacemaker cluster solution addresses this issue by using a “fencing” mechanism. Fencing ensures that only one cluster node has access to a resource at a time. This restriction is mandatory within clusters where one system could take up the role of another one to ensure the data integrity of the overall system. If a standby node suddenly cannot communicate with the primary node, it terminates the affected node to ensure that both nodes are not active simultaneously. The failover to the surviving node continues, with workloads taking over the role as the new primary node. For information about how fencing is implemented in this solution, see Configure fencing for the cluster.