High availability overview

The Dell Validated Design for Retail Edge with inVia may be deployed to different platforms depending on requirements - VxRail, Azure Stack ACI, and Kernel-based Virtual Machine (KVM). Both VxRail and Azure Stack ACI provide the necessary inbuilt redundancy to offer a highly available processing environment. KVM, while not offering fully redundant HA, does offer a simple method for ensuring the high availability of the KVM hypervisor with the help of a nested virtualization technique. This provides the ability to perform live migrations and addresses planned outage scenarios.

Note: See References for more documentation about VxRail, Azure Stack HCI and KVM.

VxRail HCI

Developed by Dell Technologies and VMware, VxRail appliances are the only fully integrated, preconfigured, and tested HCI appliance with a choice of Dell PowerEdge servers powered by VMware vSphere and vSAN technologies for software-defined storage (SDS). The Dell Validated Design for Retail Edge on VxRail is based on a three-node vSAN cluster configuration. Three vSAN Ready Nodes are configured to form a single ESXi cluster to create a pool of compute and storage capacity. The three-node VxRail configuration uses vSAN with failure to tolerate (FTT) set to 1, where data stored on one ESXi host is copied on a second node to ensure availability if there is a node failure. The VxRail three-node configuration provides redundancy for VM resources and the applications running on them. With multinode clustering and multiple configuration options, VxRail offers the self-healing capability for an edge computing deployment of the inVia applications.

Note: See the Dell VxRail documentation in References for more information about VxRail.

Azure Stack HCI

Dell Azure Stack HCI is a hyperconverged infrastructure (HCI) solution that is offered by Dell Technologies in partnership with Microsoft Azure. The solution is designed to help organizations modernize their data centers and enable hybrid cloud capabilities by combining the power of Dell hardware with Microsoft Azure Stack HCI software. Azure and Dell Azure Stack HCI enable you to deliver a seamless Azure experience across locations by extending Azure ARC as the single control plane to manage your core data centers and edge locations. The Dell Validated Design for Retail Edge with inVia on Azure Stack HCI is based on a three-node Azure Stack HCI cluster, and with the Hyper-V Manager feature, cluster failover and Windows Admin Center integration provides full lifecycle management and simplified server and cluster management.

Note: See the Azure documentation in References for more information about Azure Stack HCI.

KVM

Kernel-based Virtual Machine (KVM) is an open-source virtualization technology that is built into the Linux Kernel. KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs). KVM converts Linux into a type-1 (bare-metal) hypervisor. All hypervisors need some operating system-level components (such as a memory manager, process scheduler, input/output (I/O) stack, device drivers, security manager, a network stack, and more) to run VMs. KVM has all of these components because it is part of the Linux kernel. Every VM is implemented as a regular Linux process, scheduled by the standard Linux scheduler, with dedicated virtual hardware like a network card, graphics adapter, CPU, memory, and disks.

The main benefit of the KVM hypervisor is its native availability on Linux. Since KVM is part of Linux, it installs natively, enabling an intuitive user experience and smooth integration. Leveraging KVM functionality is straightforward after installing the required kernel packages and enabling virtualization on the KVM-enabled host. The primary drawback of KVM is that integration of add-ons and third-party solutions is required to leverage true high availability and disaster recovery. One such solution may be found on the ClusterLabs Site for information about Corosync, sbd, resource-agents, fence-agents, and other projects. The Dell Validated Design for Retail Edge with inVia on KVM leverages QEMU (a machine emulator and virtualizer) and libvirt (a library that is used to interface with the virtualization layer). The solution is based on two Dell PowerEdge XR11 Servers operating as KVM hosts on Ubuntu 22.04, leveraging NFS as shared storage. The use of NFS as shared storage is not recommended in production environments, and is used here simply to demonstrate KVM capabilities, as a clustered KVM solution normally involves block-based replication.

Note: See the KVM documentation in References for more information about KVM.

Resiliency of the inVia application stack on VxRail HCI

As described earlier, there are several components that are involved in the solution, and all of them offer varying degrees of resiliency. Database services, web services, and inVia application services are automatically started when the VM fails over or migrates to another node in the VxRail cluster, making the entire stack ready to use without user intervention. As user connectivity relies on access to the web services, there is no impact to users other than a momentary glitch.

Resiliency of the inVia application stack on Azure Stack HCI

By leveraging the Azure Site Recovery service, organizations can protect their data and workloads by providing business continuity and disaster recovery services. This service helps ensure that your VM workloads remain available even in the event of a planned or unplanned outage. Azure Site Recovery consists of several steps, namely replication, failover, re-protect, and failback. On Azure Stack HCI clusters running the May 2023 cumulative update of version 22H2 and later, it is possible to replicate your on-premise Azure Stack HCI virtual machines into Azure to protect your critical workloads. You can then create recovery plans within Azure that allow you to failover and recover the VMs in Azure if there is an outage on your Azure Stack HCI cluster. Once the outage is corrected, you can failback the VMs to the Azure Stack HCI cluster.

Resiliency of the inVia application stack on KVM

KVM can offer redundant HA by leveraging the use of packages, add-ons, and third-party solutions. These solutions depend mostly upon the use of corosync and pacemaker packages, the discussion of which is beyond the scope of this document.

KVM does however facilitate live migrations of VM guests with shared storage, providing planned outages and enabling workloads to be migrated from one KVM host to another. inVia application services are automatically started when the VM migrates to another host, making the entire stack ready to use without user intervention. As user connectivity relies on access to the web services, the endpoints can be load-balanced so there is no impact to users other than a momentary glitch.

Role-based HA management for devices, users, and applications

inVia applications may not offer the role-based service level management that is needed to ensure that a certain set of mission-critical applications, users, and devices have higher availability compared to other noncritical users. Role-based HA management is important to ensure that high-priority applications remain available and continue to perform. This requires application deployment with proper understanding of the application, user load, connected devices, and priorities.

There are several ways to address the needs for role-based HA management:

Based on the prioritized grouping of the inVia application components, a separate set of VMs and database VMs can be used for deployment. Such physical separation allows configuration of a different set of policies to ensure higher availability. VMware DRS allows fully automated, partially automated, and manual placement for load balancing and resource scheduling. VMware HA offers host rules to keep a user-defined set of VMs together. inVia application VMs can be configured with specific host rules to ensure consistency of performance and availability across all application components. This separation of the application stack also allows the use of independent database services and storage devices with additional policies.
Such role-based configurations determine the configuration needs of the additional level of clusters for VxRail, inVia applications, and databases, among other things. Clusters are complex to configure and manage, resource-intensive, and they can be only configured for high priority applications and not for all applications.
Similarly, other policies like security, alerts and monitoring, database snapshots, backup and recovery can also be employed differently when considering role-based availability of inVia applications.

Predictive maintenance and analytics with VxRail

VxRail integrates with VMware capabilities of cloning and periodic snapshots. Virtualized infrastructure for inVia applications can be cloned to create restartable point-in-time images of those applications and datasets. These clones can be leveraged to create QA, dev, reporting, and analytics environments. Such deployments can provide additional efficient copies of production data to identify systemic issues, report heath status, and determine the need for predictive maintenance. As edge systems gather data from a diverse set of IoT devices, such copies can help perform analytics at near-edge data centers and leverage hyperscalers to make even smarter decisions at scale. Also, for some failures, such clones and snapshots can be used to restore the production environment without the need for complex and involved recovery operations.

Predictive maintenance and analytics with Azure Stack

Azure Stack HCI provides similar features to VxRail to clone virtual machines and take snapshots of the underlying disks, which may then be leveraged to restore the production environment without the need for complex and involved recovery operations. The Failover Cluster Manger provides additional capabilities to monitor critical events and alerts within the cluster.

Predictive maintenance and analytics with KVM

KVM is an extension to the Linux kernel, and as such it relies primarily on existing operating system-level components to provide adequate predictive maintenance of components. By managing KVM hosts with the user interface, Virt-Manager users can gain valuable visual insights into performance across CPU, memory, disk, and network resources. There are also many third-party add-ons, offering proactive KVM hypervisor monitoring that collects detailed insights about the health and performance of the KVM hosts and their associated virtual machines in real time, as well as generation of performance reports with intelligent analytics.

RPO and RTO management

For 24/7 retail environments, the ability to quickly troubleshoot a failure, and more importantly, recover from a failure situation, are key considerations. Different users, applications, and devices can have a different recovery point objective (RPO)—which brings the application state to the last good state for continued operations and recovery time objective (RTO). RTO is the time that it takes to bring the entire application stack to the last good state. VxRail resiliency and VMware clone, snapshot, and other capabilities described earlier help manage service levels for RPO and RTO. inVia applications depend on continuous availability of database services, and various backup and recovery options are available to ensure availability of the application stack.

Aggregating data sources and supporting multiple use cases at scale

Edge applications aggregate data from a diverse set of sensors and gateways that support various network topologies and use different protocols for north-bound traffic to edge systems. inVia applications support a large set of such protocols and communication channels. It is common to have multiple layers of gateways supporting a large set of sensors. inVia applications support many protocols for north-bound traffic and can communicate with multiple application instances. By deploying multiple instances, and aggregating and storing data from diverse data sources, users can realize HA and also provide an additional set of services from multiple instances.

For example, one inVia application instance can be used for analytics and predictive maintenance, whereas another can be used for reporting the overall health of the system and time series data.

Isolation and multitenant network management

VxRail, Azure Stack HCI, and KVM infrastructures support multiple network interfaces. Such redundancies in the network configuration offer HA, and if any network paths go down, operations are not impacted. Such networks can be configured with independent VLANs to isolate traffic for multitenant environments, improving overall availability and allowing effective noisy neighbor management.

Your Browser is Out of Date