Self-healing VxRail HCI
Developed by Dell Technologies and VMware, VxRail appliances are the only fully integrated, preconfigured, and tested HCI appliance with a choice of Dell PowerEdge servers powered by VMware vSphere and vSAN technologies for software-defined storage (SDS). The Dell Validated Design for Manufacturing Edge with PTC on VxRail is based on a three-node vSAN cluster configuration with vMotion, Distributed Resources Scheduler (DRS), and HA for planned and unplanned downtime and site maintenance. Three vSAN Ready Nodes are configured to form a single ESXi cluster to create a pool of compute and storage capacity. The three-node VxRail configuration uses vSAN with Failure To Tolerate (FTT) set to 1, where data stored on one ESXi host is copied on a second node to ensure availability if there is a node failure. DRS allows automatic migration and policy-based management of VMs on available ESXi hosts in the cluster. The VxRail three-node configuration offers HA for compute and storage resources for the VMs and for the applications running on these VMs. With multinode clustering, multiple configuration options, and VMware DRS and HA, Dell VxRail offers the self-healing capability for edge computing deployment for the PTC ThingWorx application.
The PTC application stack is supported by Java virtual machine, Apache web service, and database. All components of the application stack benefit from VxRail HA without any complicated database or application-level cluster configurations. Such application-level clusters can also be configured with HA. This involves additional components, like Apache ZooKeeper to create the distributed environment, and Apache Ignite for the distributed database, clustered database, and load balancer.
Resiliency of the PTC application stack on VxRail
As described earlier, there are several components involved in the solution, and all of them offer varying degrees of resiliency.
Database services, web services, and PTC application services are automatically started when the VM fails over or migrates to another node in the VxRail cluster, making the entire stack ready to use without user intervention. As user connectivity relies on access to the web services—other than a momentary glitch—there is no impact to the users.
Role-based HA management for devices, users, and applications
PTC applications do not offer role-based service level management to ensure that a certain set of mission-critical applications, users, and devices have higher availability compared to other noncritical users. Role-based HA management is important to ensure that high-priority applications remain available and continue to perform. This requires application deployment with proper understanding of the application, user load, connected devices, and priorities.
There are several ways to address the needs for role-based HA management:
- Based on the prioritized grouping of the PTC application components, a separate set of VMs and database VMs can be used for deployment. Such physical separation allows configuration of a different set of policies to ensure higher availability. VMware DRS allows fully automated, partially automated, and manual placement for load balancing and resource scheduling. VMware HA offers host rules to keep a user-defined set of VMs together. PTC application VMs can be configured with specific host rules to ensure consistency of performance and availability across all application components. This separation of the application stack also allows the use of independent database services and storage devices with additional policies.
- Such role-based configurations determine the configuration needs of the additional level of clusters for VxRail, PTC applications, and databases, among other things. Clusters are complex to configure and manage, resource-intensive, and they can be only configured for high priority applications and not for all applications.
- Similarly, other policies like security, alerts and monitoring, database snapshots, backup and recovery can also be employed differently when considering role-based availability of PTC applications.
Predictive maintenance and analytics with VxRail
VxRail integrates with VMware capabilities of cloning and periodic snapshots. Virtualized infrastructure for PTC applications can be cloned to create restartable point-in-time images of those applications and datasets. These clones can be leveraged to create QA, Dev, Reporting, and Analytics environments. Such deployments can provide additional efficient copies of production data to identify systemic issues, report heath status, and determine the need for predictive maintenance. As edge systems gather data from a diverse set of IIoT devices, such copies can help perform analytics at near-edge data centers and leverage hyper scalers to make even smarter decisions at scale. Also, for some failures, such clones and snapshots can be used to restore the production environment without the need for complex and involved recovery operations.
RPO and RTO management
For 24/7 industrial environments, the ability to quickly troubleshoot a failure, and more importantly, recover from a failure situation, are key considerations. Different users, applications, and devices can have a different Recovery Point Objective (RPO)—which brings the application state to the last good state for continued operations and Recovery Time Objective (RTO). RTO is the time it takes to bring the entire application stack to the last good state. VxRail resiliency and VMware clone, snapshot, and other capabilities described earlier help manage service levels for RPO and RTO. PTC applications depend on continuous availability of database services, and various backup and recovery options are available to ensure availability of the application stack. Thus, PTC applications running on Dell EMC VxRail help to manage and improve RPO and RTO.
Aggregating data sources and supporting multiple use cases at scale
Edge applications aggregate data from a diverse set of sensors, devices, and gateways that support various network topologies and use different protocols for north-bound traffic to edge systems. PTC applications support a large set of such protocols and communication channels. It is common to have multiple layers of gateways supporting a large set of sensors. PTC Kepware Server and Kepware Edge support many protocols for north-bound traffic and can communicate with multiple PTC ThingWorx instances. By deploying multiple ThingWorx instances, collecting and storing data from PTC Kepware Server, users can realize HA and also provide an additional set of services from multiple instances.
For example, one PTC ThingWorx Foundation Server instance can be used for analytics and predictive maintenance, whereas another can be used for reporting overall health of the system and time series data.
Isolation and multitenant network management
Many edge devices provide multiple physical interfaces to connect to multiple north-bound IP addresses that support various PTC applications. VxRail infrastructure and network switches support multiple network interfaces. Such redundancies in the network configuration offer HA, and if any network paths go down, operations are not impacted. Such networks can be configured with independent VLANs to isolate traffic for multitenant environments, improving overall availability and allowing effective noisy neighbor management.