Self-healing VxRail HCI
Developed by Dell Technologies and VMware, VxRail appliances are the only fully integrated, preconfigured, and tested HCI appliance with a choice of Dell PowerEdge servers powered by VMware vSphere and vSAN technologies for software-defined storage (SDS). The Dell Validated Design for Manufacturing Edge on VxRail is based on a three-node vSAN cluster configuration with vMotion, Distributed Resources Scheduler (DRS), and HA for planned and unplanned downtime and site maintenance. Three vSAN Ready Nodes are configured to form a single ESXi cluster to create a pool of compute and storage capacity. The three-node VxRail configuration uses vSAN with Failure To Tolerate (FTT) set to 1, where data stored on one ESXi host is copied on a second node to ensure availability if there is a node failure. DRS allows automatic migration and policy-based management of VMs on available ESXi hosts in the cluster. The VxRail three-node configuration offers HA for compute and storage resources for the VMs and for the applications running on these VMs. With multinode clustering, multiple configuration options, and VMware DRS and HA, Dell VxRail offers the self-healing capability for edge computing deployment for the ISV applications.
Resiliency of the ISV application stack on VxRail
As described earlier, there are several components involved in the solution, and all of them offer varying degrees of resiliency.
Database services, web services, and ISV application services are automatically started when the VM fails over or migrates to another node in the VxRail cluster, making the entire stack ready to use without user intervention. As user connectivity relies on access to the web services—other than a momentary glitch—there is no impact to the users.
Role-based HA management for devices, users, and applications
ISV applications may not offer the role-based service level management needed to ensure that a certain set of mission-critical applications, users, and devices have higher availability compared to other noncritical users. Role-based HA management is important to ensure that high-priority applications remain available and continue to perform. This requires application deployment with proper understanding of the application, user load, connected devices, and priorities.
There are several ways to address the needs for role-based HA management:
- Based on the prioritized grouping of the ISV application components, a separate set of VMs and database VMs can be used for deployment. Such physical separation allows configuration of a different set of policies to ensure higher availability. VMware DRS allows fully automated, partially automated, and manual placement for load balancing and resource scheduling. VMware HA offers host rules to keep a user-defined set of VMs together. ISV application VMs can be configured with specific host rules to ensure consistency of performance and availability across all application components. This separation of the application stack also allows the use of independent database services and storage devices with additional policies.
- Such role-based configurations determine the configuration needs of the additional level of clusters for VxRail, ISV applications, and databases, among other things. Clusters are complex to configure and manage, resource-intensive, and they can be only configured for high priority applications and not for all applications.
- Similarly, other policies like security, alerts and monitoring, database snapshots, backup and recovery can also be employed differently when considering role-based availability of ISV applications.
Predictive maintenance and analytics with VxRail
VxRail integrates with VMware capabilities of cloning and periodic snapshots. Virtualized infrastructure for ISV applications can be cloned to create restartable point-in-time images of those applications and datasets. These clones can be leveraged to create QA, Dev, Reporting, and Analytics environments. Such deployments can provide additional efficient copies of production data to identify systemic issues, report heath status, and determine the need for predictive maintenance. As edge systems gather data from a diverse set of IIoT devices, such copies can help perform analytics at near-edge data centers and leverage hyper scalers to make even smarter decisions at scale. Also, for some failures, such clones and snapshots can be used to restore the production environment without the need for complex and involved recovery operations.
RPO and RTO management
For 24/7 industrial environments, the ability to quickly troubleshoot a failure, and more importantly, recover from a failure situation, are key considerations. Different users, applications, and devices can have a different Recovery Point Objective (RPO)—which brings the application state to the last good state for continued operations and Recovery Time Objective (RTO). RTO is the time it takes to bring the entire application stack to the last good state. VxRail resiliency and VMware clone, snapshot, and other capabilities described earlier help manage service levels for RPO and RTO. ISV applications depend on continuous availability of database services, and various backup and recovery options are available to ensure availability of the application stack. Thus, ISV applications running on Dell VxRail help to manage and improve RPO and RTO.
Aggregating data sources and supporting multiple use cases at scale
Edge applications aggregate data from a diverse set of sensors, devices, and gateways that support various network topologies and use different protocols for north-bound traffic to edge systems. ISV applications support a large set of such protocols and communication channels. It is common to have multiple layers of gateways supporting a large set of sensors. ISV applications support many protocols for north-bound traffic and can communicate with multiple application instances. By deploying multiple instances, and aggregating and storing data from diverse data sources, users can realize HA and also provide an additional set of services from multiple instances.
For example, one ISV application instance can be used for analytics and predictive maintenance, whereas another can be used for reporting overall health of the system and time series data.
Isolation and multitenant network management
Many edge devices provide multiple physical interfaces to connect to multiple north-bound IP addresses that support various ISV applications. VxRail infrastructure and network switches support multiple network interfaces. Such redundancies in the network configuration offer HA, and if any network paths go down, operations are not impacted. Such networks can be configured with independent VLANs to isolate traffic for multitenant environments, improving overall availability and allowing effective noisy neighbor management.