RPO, RTO, and tracking the last known good state of the system
Applications can keep track of the last known good state of the system in various ways: journaling, logging, periodic snapshots, and so on. This tracking determines the RPO. Also, the amount of time it takes to start from the known state to a good restart point will determine RTO. These two are important measures for the HA capabilities of a system. This section covers configuration, operation, and recovery aspects.
- Configuration data
- Edge devices may allow configuration related to data collection type and frequency, ability to send alerts and notification in specific conditions, period reports, and integration with external APIs. When failures occur, such configurations can be backed up to restore the configuration to a working state, or to quickly make replacement devices operational. Such configurations can be exported and stored on resilient folders on vSAN datastores. Note: For more details, see the device-specific documentation.
- Several ISV applications can back up their configuration. Even if such back-up practices are in place, it may be useful to use vSAN datastores for backups as well for an additional level of protection and availability.
- VxRail configuration is saved in an internal database. When the first run process is run during initial configuration, the database may not have full configuration details. Full configuration can be exported later as a JSON file that is stored externally to improve RPO and RTO.
- Edge devices may allow configuration related to data collection type and frequency, ability to send alerts and notification in specific conditions, period reports, and integration with external APIs. When failures occur, such configurations can be backed up to restore the configuration to a working state, or to quickly make replacement devices operational. Such configurations can be exported and stored on resilient folders on vSAN datastores.
- Operational data
- Edge devices collect and aggregate time series data and send them to the ISV application stack on which manufacturing operations depend. Edge devices depend on data services and analytics systems for operational intelligence.
Although the VxRail virtual infrastructure that ISV applications run on provides a high degree of resiliency, edge devices operate independently and may not have the same resiliency as the VxRail environment. They generally have multiple network ports, and using the available ports allows redundant communication channels and improves RPO and RTO.
Edge devices also operate under harsh conditions with excessive temperature, vibration, and other factors; and at times they need to operate in disconnected environments. Edge devices must be physically hardened and secured to work in such environments. Such devices accumulate data in a disconnected situation according to the available local resources. By ensuring appropriate bandwidth when the connection is re-established, they can improve RPO and RTO and contribute to the overall health of the infrastructure.
Many edge devices need to be portable and must connect to different networks and end devices quickly to ensure continuity. Edge devices need to support plug-and-play to allow communication with new devices, and to quickly discover and connect to available network interfaces for continued operation.
Non-disruptive updating and working in a non-uniform environment
Systems with effective HA should allow functioning in a non-uniform fashion. Even if all the components are not at the same software revision level, the system should continue to function. One premise of HA is to allow specific components to be updated transparently so that applications continue to run within another set of working components. The DVD for Manufacturing Edge on VxRail enables independent updating of standalone and clustered ISV applications. Updates are not applied automatically, so customers can have full control. The VxRail system also allows independent, non-disruptive updating for all its components, with administrative control over when to apply each update.
OT and IT user personas in regard to HA
Operation Technologies (OT) personnel are looking for real-time access to production assets to improve efficiency, and they need the ability to connect to other plants, global suppliers, employees, and partners. They are responsible for industrial automation control systems (IACS), the plant-wide manufacturing execution system (MES), Supervisory Control and Data Acquisition (SCADA), historians, and asset management. They are concerned with HA and the security of these systems, but lack the control and knowledge of infrastructures, data services, and networks that these systems use—which are managed by Information Technologies (IT). HA for edge infrastructure ensures maximum availability for both OT and IT systems, with minimal interaction or impact on either side of the edge infrastructure. ISV applications on the Dell Validated Design for Manufacturing Edge offer such capabilities by providing control, flexibility, and workflows for HA needed by OT and IT. They support a multiprotocol environment encompassing both OT and IT systems, and provide a policy-driven, single pane of control for HA for industrial automation systems.
How an HA system recovers
When components in the HA system go down, it is important to understand when and how those components can be brought up so that the HA system returns to its full functional capability. Understanding the degraded state of the system is helpful. If the system functions in the degraded state for a long time, surviving components will eventually go down and result in increased downtime. The following are some of the considerations for HA for ISV applications on Dell VxRail Hyperconverged Systems:
- VMs supporting ISV applications should use VMware capabilities like Dynamic Resource Scheduler, vMotion, and HA rules to ensure optimal performance with policy-based user controls.
- VMware vSAN datastores perform with the highest level of availability. There are also clone and snapshot capabilities at the VM level for creating additional copies of the environment. These point-in-time copies allow restoration and recovery of the environment with minimal data loss and greater RPO and RTO.
VxRail already offers storage and compute clusters for ISV applications. VxRail can support multiple deployments of ISV applications with minimum Operating Expense (OpEx), allowing the separation of various plant and factory workflows, with better multitenancy controls with desired service levels. Separate deployments can also leverage independent datastores controlled by different policies. When failures occur, it is easy to identify, troubleshoot, and recover impacted components. This kind of deployment improves overall availability and performance with less management overhead.