Home > Workload Solutions > SAP > Guides > DVD for High Availability with Red Hat Pacemaker Clusters Running SAP HANA on Dell S5000 Series Servers > Solution introduction
In production environments, SAP business systems need infrastructure that supports a high level of availability to avoid any missed transactions or opportunities. Ensuring this availability level is the responsibility of the IT department. Planning and configuring a highly available SAP landscape can be a challenge, however. An ideal solution would eliminate any single points of failure and ensure that the loss of a single component, or even a data center, has no impact on the overall system. The level of protection needed depends on two considerations: how much productivity is lost when the SAP system is unavailable, and how much investment is required to prevent this loss of productivity.
This solution provides design best practices for attaining high availability (HA) for local hardware failures and data center failures, including:
Customers can use existing resources in the data center to enhance SAP HANA system replication (HSR) availability. The process of automating system replication in the event of a failure can be managed through cluster software. If a node fails on the primary SAP HANA host before a failover is fully initiated, SAP HANA systems using HSR can experience a “split brain” condition. By choosing an active/active read-enabled SAP HANA configuration, customers can avoid the need to "cold start" the SAP HANA standby database, which can lead to application delays due to the time that is required for tables to reload into memory.
The Dell and Red Hat Pacemaker cluster solution addresses the issue by using a fencing technique. Fencing ensures that only one cluster node can access a resource at any given time. This restriction is necessary in clusters where one system could take over the role of another system to ensure the data integrity of the overall system. If a standby node suddenly loses communication with the primary node, the cluster terminates the affected node to avoid having both nodes active at the same time. The surviving node then becomes the new primary node and resumes the workloads.