Home > Workload Solutions > SAP > Guides > SAP HANA TDI Guides > Dell Validated Solution for SAP HANA TDI Deployments with Dell Unity XT Storage > Implementing STONITH with the HA/DR provider for SAP HANA
This section applies only to multihost SAP HANA scale-out instances and the host autofailover. On failover, the database on the standby host must have read-access and write-access to the files of the failed active host. If the failed host can still write to these files, the files might become corrupted. Preventing this corruption is called fencing.
When you use shared file systems, such as Unity XT NAS storage and NFSv3 or NFSv4, the STONITH method is implemented to achieve proper fencing capabilities and ensure that locks are always freed.
Note: For multihost SAP HANA scale-out instances and the host autofailover with NFSv3, the STONITH (SAP HANA HA/DR provider) implementation is mandatory. With NFSv4, a locking mechanism based on lease-time is available. If the locking mechanism is used for I/O fencing, STONITH is not required. However, STONITH can be used to speed up failover and ensure that locks are always released.
In such a setup, the storage connector API can be used for invoking the STONITH calls. During failover, the SAP HANA leading host calls the STONITH method of the custom storage connector with the hostname of the failed host as the input value.
A mapping of hostnames to management network addresses is maintained, which is used to send a reboot signal to the server through the management network. When the host restarts, it automatically starts in standby host role. The STONITH example uses IPMI in bare-metal deployments with Dell PowerEdge servers.
For PowerEdge servers, you must configure IPMI over LAN for integrated Dell Remote Access Controller (iDRAC) to enable or disable IPMI commands over LAN channels to any external systems. If IPMI over LAN is not configured, external systems cannot communicate with the iDRAC server using IPMI commands.
To configure IPMI over LAN:
The iDRAC Settings page is displayed, as shown in the following figure:
The /etc/hosts file maintains a mapping of hostnames to IPMI IP addresses that are to be used in STONITH by using a standard naming convention, as follows:
# #
# IPMI mapping
#
10.230.79.85 hana01-ipmi
10.230.79.86 hana02-ipmi
10.230.79.87 hana03-ipmi
ipmitool power status –H hana01-ipmi -U root -P xxxx
If IPMI is working successfully, Chassis Power is on is returned.
chmod u+s /usr/bin/ipmitool
To create your own HA/DR provider, perform the following steps and then add the hook method that you want to use. This guide uses STONITH. For more information, see the SAP HANA Administration Guide.
The directory must be within the /hana/shared storage of the SAP HANA installation but outside the <SID> directory structure. Our example uses the /hana/shared/HANA_Hooks location.
"""
Sample for a HA/DR hook provider.
When using your own code in here, please copy this file to location on /hana/shared outside the HANA installation.
This file will be overwritten with each hdbupd call! To configure your own changed version of this file, please add
to your global.ini lines similar to this:
[ha_dr_provider_<HA_STONITH_Hook>]
provider = <HA_STONITH_Hook>
path = /hana/shared/HANA_Hooks
execution_order = 1
For all hooks, 0 must be returned in case of success.
"""
from hdb_ha_dr.client import HADRBase, Helper
import os, time
class HA_STONITH_Hook(HADRBase):
def __init__(self, *args, **kwargs):
# delegate construction to base class
super(HA_STONITH_Hook, self).__init__(*args, **kwargs)
def about(self):
return {"provider_company" : "Dell",
"provider_name" : "HA_STONITH_Hook", # provider name = class name
"provider_description" : "Dell STONITH HOOK for SAP HANA",
"provider_version" : "2.0"}
def startup(self, hostname, storage_partition, system_replication_mode, **kwargs):
self.tracer.debug("enter startup hook; %s" % locals())
self.tracer.debug(self.config.toString())
self.tracer.info("leave startup hook")
return 0
def shutdown(self, hostname, storage_partition, system_replication_mode, **kwargs):
self.tracer.debug("enter shutdown hook; %s" % locals())
self.tracer.debug(self.config.toString())
self.tracer.info("leave shutdown hook")
return 0
def failover(self, hostname, storage_partition, system_replication_mode, **kwargs):
self.tracer.debug("enter failover hook; %s" % locals())
self.tracer.debug(self.config.toString())
self.tracer.info("leave failover hook")
return 0
def stonith(self, failingHost, **kwargs):
self.tracer.debug("enter HANA HA stonith hook; %s" % locals())
self.tracer.debug(self.config.toString())
self.tracer.info( "Stonith - rebooting failing host %s" % failingHost)
ipmi_host = "%s-ipmi" % failingHost
power_cycle = "ipmitool power cycle -I lanplus -H %s -U root -P Xxxxxxxx " % ipmi_host
power_on = "ipmitool power on -I lanplus -H %s -U root -P Xxxxxxxxx " % ipmi_host
rc = os.system(power_cycle)
time.sleep(10)
if rc == 0:
msg = "Power cycle successfully executed to the failed host %s" % failingHost
self.tracer.info(msg)
rc = 0
elif rc !=0:
msg = "failed to power cycle %s, will try again" % failingHost
self.tracer.info(msg)
rc = os.system(power_on)
time.sleep(10)
if rc == 0:
msg = "Successfully powered on %s" % failingHost
self.tracer.info(msg)
rc = 0
elif rc !=0:
msg = "unable to power cycle %s - Please CHECK" % failingHost
self.tracer.info(msg)
return 1
self.tracer.info("leaving HANA HA stonith hook")
return rc
def preTakeover(self, isForce, **kwargs):
"""Pre takeover hook."""
self.tracer.info("%s.preTakeover method called with isForce=%s" % (self.__class__.__name__, isForce))
if not isForce:
# run pre takeover code
# run pre-check, return != 0 in case of error => will abort takeover
return 0
else:
# possible force-takeover only code
# usually nothing to do here
return 0
def postTakeover(self, rc, **kwargs):
"""Post takeover hook."""
self.tracer.info("%s.postTakeover method called with rc=%s" % (self.__class__.__name__, rc))
if rc == 0:
# normal takeover succeeded
return 0
elif rc == 1:
# waiting for force takeover
return 0
elif rc == 2:
# error, something went wrong
return 0
def srConnectionChanged(self, parameters, **kwargs):
self.tracer.debug("enter srConnectionChanged hook; %s" % locals())
# Access to parameters dictionary
hostname = parameters['hostname']
port = parameters['port']
volume = parameters['volume']
serviceName = parameters['service_name']
database = parameters['database']
status = parameters['status']
databaseStatus = parameters['database_status']
systemStatus = parameters['system_status']
timestamp = parameters['timestamp']
isInSync = parameters['is_in_sync']
reason = parameters['reason']
siteName = parameters['siteName']
self.tracer.info("leave srConnectionChanged hook")
return 0
def srReadAccessInitialized(self, parameters, **kwargs):
self.tracer.debug("enter srReadAccessInitialized hook; %s" % locals())
# Access to parameters dictionary
database = parameters['last_initialized_database']
databasesNoReadAccess = parameters['databases_without_read_access_initialized']
databasesReadAccess = parameters['databases_with_read_access_initialized']
timestamp = parameters['timestamp']
allDatabasesInitialized = parameters['all_databases_initialized']
self.tracer.info("leave srReadAccessInitialized hook")
return 0
def srServiceStateChanged(self, parameters, **kwargs):
self.tracer.debug("enter srServiceStateChanged hook; %s" % locals())
# Access to parameters dictionary
hostname = parameters['hostname']
service = parameters['service_name']
port = parameters['service_port']
status = parameters['service_status']
previousStatus = parameters['service_previous_status']
timestamp = parameters['timestamp']
daemonStatus = parameters['daemon_status']
databaseId = parameters['database_id']
databaseName = parameters['database_name']
databaseStatus = parameters['database_status']
self.tracer.info("leave srServiceStateChanged hook")
return 0
def srSecondaryUnregistered(self, parameters, **kwargs):
self.tracer.debug("enter srSecondaryUnregistered hook; %s" % locals())
# Access to parameters dictionary
siteName = parameters['site_name']
siteId = parameters['site_id']
reason = parameters['reason']
self.tracer.info("leave srSecondaryUnregistered hook")
return 0
You can add, configure, and monitor your custom provider scripts in the SAP HANA Cockpit. After the HA/DR provider script is created, you can install it on an SAP HANA system by adding an ha_dr_provider_<classname>] section to the global.ini file with following parameters:
For example, add the following details to the global.ini file:
[ha_dr_provider_<HA_STONITH_Hook>]
provider = HA_STONITH_Hook
path = /hana/shared/HANA_Hooks
execution_order = 50
Using the SAP HANA Cockpit, in your SAP HANA database, select
Database Administration > Manage system configuration. You can add, configure, and monitor the HA/DR provider information, as shown in the following figure:
All scripts are loaded during the startup phase of the name server. You can monitor the name server trace file while general information is collected about the ha_dr_provider and return codes.
Perform host autofailovers to ensure that the failovers work as expected and the STONITH has been implemented on the failed host. The following figure shows an example of output from the name server trace file following a host autofailover and successful implementation of the STONITH: