Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
United States/English
Home > Storage > PowerStore > Storage Admin > Solution Briefs

Solution Briefs

Documents (1)

  • PowerStore
  • data management
  • RAS

Dell PowerStore: Persistent Data Availability

Ryan Meyer Robert Weilhammer Ryan Poulin Louie Sasa Ryan Meyer Robert Weilhammer Ryan Poulin Louie Sasa

Wed, 29 May 2024 19:32:44 -0000

|

Read Time: 0 minutes

Summary

At Dell Technologies, storage solutions are architected for reliability, availability, and serviceability (RAS) as part of their design. Dell storage products also include redundant hardware components and intelligent software architected to deliver extreme performance and availability—simultaneously. This combination results in unparalleled operational durability, while also leveraging components in new ways that decrease the total cost of ownership of each system.

Data availability

Today’s business-critical application environments demand solutions which offer availability that is expressed in more than a simple number-of-nines of availability. While Dell PowerStore is designed for 99.9999% availability, there is more to availability than only the probability that a system will be operational during a defined period. There are also required features and automated processes that must be considered which enable non-disruptive operations so that the environment is persistent. These considerations provide the best opportunity for high availability and zero downtime.

All Dell storage solutions can incorporate onboard and interoperable data protection functionality such as remote replication and snapshots of data, used to deliver nonstop business continuity. Solutions can also include intelligent software which automates the processes which are needed to seamlessly operate an always-on environment.

Dell Technologies combines storage solutions architected for RAS, comprehensive remote replication technologies, local snapshot technologies, and intelligent software which helps automate data center operations. This combination results in solutions which go beyond just a bunch of nines to an always-on, persistent-data-availability environment.

Persistent solutions require that there be no single point of failure in the entire environment. This requirement spans the entire application infrastructure—from the hosts to the switches, to the storage, and to the data center itself. This means there must be complete redundancy end to end and automation to detect and handle various fault conditions. For example, all the application components in a data center could be designed for the highest levels of availability. However, if a power-loss event occurs within the building or within the local region, the application could become unavailable unless redundancy is included at the data-center level.

High availability hardware resiliency

PowerStore appliances support clustering which allows for multi-appliance configurations, and each appliance has its own independent fault tolerance. Also, a single PowerStore appliance is a fully unified block and file environment packaged into a single 2U enclosure. Each PowerStore within the cluster consists of two nodes that make up a high availability (HA) pair. PowerStore appliances feature both fully redundant hardware and highly available software features. These features keep the system online if there are component failures, environmental failures, and even simultaneous failures of multiple components such as an internal fan and a disk drive. You can replace hardware using the redundant node architecture, keeping the system and data online and available.

PowerStore has a dual-node architecture where each node is identical, giving the ability to serve I/O in an active/active manner. Active/active capabilities increase hardware efficiency since there is no requirement for idle-standby hardware. In this way, PowerStore efficiently makes full use of all available resources through a highly redundant architecture. If SAN or IP network port connectivity is lost, the system uses redundant port connections, allowing hosts and clients to maintain access to the data. Also, dual-ported hard drives ensure that each node has seamless connectivity to the data.

PowerStore protects data writes using redundant shared write cache that is used by both nodes simultaneously. For PowerStore 1000 to 9200 appliances, any incoming I/O that is determined to be a write is stored in DRAM memory. Then, it is copied to the NVMe NVRAM drives so that each node in the appliance can access the data. In PowerStore 500 appliances, since NVRAM drives are not supported, write I/Os are mirrored to DRAM memory in each node. This ability enables the write cache to be preserved through hardware and software faults, and node reboots.

Also, if there is a power outage or temperature alarm on both nodes, the integrated battery backup unit (BBU) provides temporary power to the system, allowing the cache to be de-staged.

High availability software features

Dynamic Resiliency Engine (DRE) is a 100% software-based approach to redundancy that is more distributed, automated, and efficient than traditional RAID. It meets RAID 6 and RAID 5 parity requirements with superior resiliency and at a lower cost. This feature ensures enterprise-class availability by achieving faster drive-rebuild times with distributed sparing. DRE rebuilds smaller chunks of the drive simultaneously to multiple drives in the appliance. Intelligent allocation of unused user space is designed to replenish spare space for handling multiple drive failures automatically. Rebuild speeds are calculated intelligently when there is incoming I/O to ensure performance and availability during rebuild. DRE allows for flexible configurations that can lower TCO by expanding storage with a single drive at a time or by adding different drive sizes based on storage needs. Introduced with PowerStoreOS 2.0, the benefit of double-drive-failure tolerance further increased the number of concurrent failed drives that the system could withstand. Starting in PowerStoreOS 4.0, the DRE architecture will be unified by intelligently leveraging capacity tiers from the appliance’s reserve space. These changes result in approximately 2% of usable capacity returned to the user.

Active/active architecture allows for configuring paths across nodes for both iSCSI Ethernet and FC configurations using SCSI and NVMe protocols. For multipathing, you can use software such as Dell PowerPath on your hosts to use the full multipathing capabilities of PowerStore. Active/active architecture ensures host I/O requests use the active optimized path for best performance. In the rare event that the host loses the active optimized path, the host will switch over to the active non-optimized path where the I/O request will be processed locally by the PowerStore node.

Snapshots are supported on block and file resources (volumes, volume groups, virtual machines, file systems, and thin clones). Snapshots ensure the availability of critical data with near-immediate reverts to a previous, known-good recovery point if there is an accidental deletion or malware outbreak. You can configure snapshot rules to support a wide range of recovery point objectives. Starting in PowerStoreOS 3.5, the secure snapshot setting can be enabled for snapshots on volumes and volume groups. Snapshot Rules can also be configured to create secure snapshots automatically. With secure snapshots enabled, the snapshots and parent resource are protected from accidental or malicious deletion and serve as a cost-effective line of defense against ransom attacks. If an unauthorized user gains access to a system, the attacker cannot delete secure snapshots and cause data loss.

Native replication is supported for file servers, block resources (volumes, volume groups, and thin clones), and VMs based on VMware vSphere Virtual Volumes (vVols). Native replication uses a TCP-based protocol to transfer data between two PowerStore clusters over Ethernet. You can apply replication rules on a policy basis to provide various recovery point objectives for your storage resources. For file and block resources, you can set up the policy in PowerStore Manager. vVol-based VMs are protected through VMware Storage Policies and VMware SRM (Site Recovery Manager). PowerStoreOS 4.0 introduces native synchronous replication support for file servers and block resources (volumes, volume groups with write-order consistency and thin clones) which utilizes the FC data transfer protocol between two PowerStore clusters. Native synchronous replication protection is also policy driven and provides zero RPO between clusters.

Metro Volume is a replication feature that allows synchronously replicated active/active block volumes and volume groups spanning across two PowerStore clusters. Bi-directional I/O synchronization provides active concurrent host I/O on participating systems. Integrated self-healing reprotects the Metro Volume after a failure scenario. PowerStore supports uniform and non-uniform host access in VMware, Windows, and Linux configurations.

Native PowerProtect DD backup integration, available since PowerStoreOS 3.5, provides a simplified native backup solution for volumes and volume groups. By configuring remote backup sessions from PowerStore Manager, users can quickly backup resources, retrieve remote snapshots, and initiate instant access sessions. Instant access allows the host to view the contents of a remote snapshot without retrieving the data back to the PowerStore system -- instantly accessing deleted, corrupted, or modified data within the snapshot and copying it back to the host for quick recovery. Users can also discover and retrieve snapshot backups created on a different PowerStore cluster.

File-Level Retention (FLR) protects file data from deletion or modification until a specified retention date. PowerStore supports FLR-Enterprise (FLR-E) and FLR-Compliance (FLR-C). FLR-C has other restrictions and is designed for companies that need to comply with federal regulations (SEC rule 17a-4(f)).

Fail-Safe Networking (FSN), introduced in PowerStoreOS 3.5, provides a switch agnostic high availability solution for NAS interfaces. With FSN, users can eliminate single points of failure (ports, cables, switch, and so on) by linking ports together in an active/passive configuration. FSN is flexible because it can work alone or with user-defined Link Aggregations (LA).

Common Event Publishing Agent (CEPA) delivers SMB and NFS file and directory event notifications to a server, enabling third-party applications parse and control them. You can employ this ability for use cases such as detecting ransomware, managing user access, configuring quotas, and providing storage analytics. The event-notification solution consists of a combination of the PowerStore, Common Event Enabler (CEE) CEPA software, and a third-party application.

Clustering provides the ability to scale up and scale out appliances independent of one another while being managed from a single management interface. This flexible deployment model enables PowerStore to start small with a single appliance configuration, and grow into a larger cluster by simply adding appliances online. As part of clustering, we also support non-disruptive migrations of volumes within the cluster.

VMware integration is deep and rich with PowerStore. When administrators are using virtual machines on the PowerStore system, native VMware features such as vSphere High Availability (HA) are automatically enabled to ensure virtual machines remain available if there is an outage. Administrators can enable other features, such as VMware Fault Tolerance (FT), to guarantee the highest levels of availability. PowerStoreOS 3.0 introduces support for a VMware File System on PowerStore, optimized for VMware environments as NFS shared datastores.

Performance metrics in historical and real-time views enable monitoring for anomalies and troubleshooting performance issues. This allows administrators to monitor their environment and be proactive in preventing potential outages. Performance metrics are fully integrated into the PowerStore environment allowing for ease of use and customization.

Performance policies can be applied to block-level resources (volumes, volume groups, vVols, and thin clones) based on a high, medium, or low setting. Applying these policies to the storage objects help prioritize I/O requests to different hosts and applications. For example, a storage administrator may apply a high setting for a volume that is used by a database application such as Microsoft SQL Server. However, for other applications such as a test/dev vVol, the administrator may choose a low setting.

Quality of Service (QoS) is the ability to limit performance capabilities of a resource. PowerStoreOS 4.0 introduces QoS support for block resources (Volumes, Volume Groups, Clones), allowing users to set IO limits on storage objects based on their SLAs. Enabling QoS equips administrators with furthered granular prioritization of IO for critical and non-critical applications.

Non-disruptive upgrades to target PowerStoreOS software ensure the best possible resilience when it comes to PowerStore software and feature functionality. This is accomplished by upgrading one node at a time and ensuring all resources are running on the node not undergoing the upgrade. PowerStore also comes with anytime upgrade options which protect the customer’s investment if they want to upgrade their hardware model.

CloudIQ and SupportAssist configurations enable proactive awareness of system health and performance and provide a holistic view of multiple storage systems regardless of if they are on the same network. SupportAssist allows for direct connection to support with automated service requests for hardware and software faults and remote connection to the array for faster troubleshooting and resolution.

Configuring a high availability environment

A best practice for a highly available storage environment is to define, implement, and regularly test a business-continuity, data-availability, and disaster-recovery plan. Be sure to fully understand the implications of global settings and their impact on system operation, and always record the original settings before making any configuration changes.

Besides resilient LAN and WAN infrastructures and a redundant environment (power and cooling), having out-of-band access to and remote power control of all nodes and switches is invaluable when administering clustered systems. In a worst-case scenario, if a cluster loses power, the write cache is battery protected and destaged to preserve any in-flight uncommitted writes to maintain write consistency.

Block guidance: Use redundant network switches at both edge and core connections to maximize path availability. Use native synchronous or asynchronous block replication whenever there is a requirement for disaster recovery. Include snapshot rules in the protection policy for point in time recovery.

File guidance: For PowerStore T model appliances, file services can run on the first two ports of the 4-port card known as the system bond or any other link aggregation configured in PowerStore Manager. To ensure these ports have optimal performance and fault tolerance, we recommend configuring multi-chassis link aggregation group (MC-LAG) on the network switches. For example, on Dell switches, an administrator should configure Virtual Link Trunking (VLT). Include snapshot rules in the protection policy for point in time recovery. Enable native synchronous or asynchronous file replication which should be used whenever there is a requirement for disaster recovery. To meet compliance criteria for certain files, enable file- level retention which will protect files from modification and deletion during the retention period. To protect your data against ransomware attacks, configure CEPA for inline data protection.

Resiliency products: Using products such as PowerStore metro node or Dell VPLEX in front of PowerStore hardware enables the highest possible uptime for all block configurations. PowerStore metro node or Dell VPLEX as a storage- virtualization layer replicates data across two PowerStore appliances in the same data center or across metro distance data centers to increase data availability. This extra layer of separation provides extra fault tolerance.

With VPLEX, PowerStore volumes are mirrored, always mounted read/write, and synchronized through VPLEX local/metro, providing continuous availability to hosts in the event one of the PowerStore systems becoming unavailable.

Measuring availability

To determine your storage system’s lifetime data availability, we measure the percentage of time the storage system is operational—or servicing user read/write operations. The data we collect is modeled using product redundancy features, parts replacement rates, mean time to restore a hardware failure, and defined service support levels. Because of the redundancy and concurrent maintenance philosophy built into every Dell storage product, most of the system issues and subsequent repairs and replacements will not affect your overall system availability.

In other words, we view data availability along with system durability from a total quality engineering perspective across all aspects of the array with your data environment. In doing so, Dell Technologies continuously brings you closer to 100% data availability.

By configuring your environment to maintain the highest availability, we also measure our ability to deliver on that promise. Through ongoing monitoring by Dell Technologies using SupportAssist and CloudIQ features, we can determine field reliability and availability which is an integral part of the total customer experience. Throughout this monitoring process, all unplanned-outage events are analyzed by engineering and service personnel to identify root-causes, and the findings are used towards continuous improvement of product, process, and personnel.

Dell Technologies can deliver on our promise of delivering you persistent data availability.

Other resources

See the following resources on the Dell Technologies Info Hub:


Read Full Blog