Important Updates in Dell’s Geographically Dispersed Disaster Restart (GDDR)
Tue, 30 Aug 2022 20:53:25 -0000
|Read Time: 0 minutes
Dell Technologies created Geographically Dispersed Disaster Restart (GDDR) to provide mainframe customers a comprehensive business continuity automation product for their Dell PowerMax Storage and Disk Library for mainframe virtual tape environments. GDDR achieves this by reacting to events within your IT environment.
The three functions of automate, react, and monitor (ARM) combine to enable continuous operations across both planned and unplanned outages. GDDR is designed to perform planned data-center site-switch operations and to restart operations following disasters. These incidents can range from the loss of compute capacity or disk-array access, to the total loss of a single data center, or a regional disaster resulting in the loss of dual data centers. GDDR also provides automation to protect data from cyberattack in a separate physical vault array. For more information about GDDR, see the document GDDR (Geographically Dispersed Disaster Restart) for PowerMax 8000 and VMAX ALL FLASH 950F.
Dell’s GDDR 5.3 enhancements
GDDR introduced an exciting new feature in GDDR 5.3 called Cyber Protection Automation (zCPA) which populates a separate physical cyber vault for your mainframe environment. zCPA automates cyber protection copy creation and preservation by using Dell’s Data Protector for z Systems (zDP). zCPA automates the creation and transmission of PowerMax snapshots to a physically separate cyber vault PowerMax array. This provides a protected copy of data that can be used for testing purposes, recovery from a cyber event, or an analytical process to better understand the extent of damage caused by a cyberattack.
The transmission of data to the cyber vault leverages SRDF/Adaptative Copy. To take advantage of zCPA, customers need GDDR 5.3 with the zCPA PTF, Mainframe Enabler 8.5, and a PowerMax at 5978.711.711 or higher.
Unique benefits of GDDR zCPA types
zCPA supports air gapped and non-air gapped physical cyber vaults. Any site in a GDDR topology can be an attached cyber vault array managed by zCPA. To provide customers choice, there are three types of methods for creating zCPA vault arrays. The three zCPA types are defined by the different configuration and operational attributes that dictate how zCPA will function.
zCPA Type 1
- Type 1 is defined as an environment that has no airgap in connectivity between a data center and the cyber vault. The data copied to the cyber vault is initiated when a newly created zDP Snapset is detected. The cyber vault in Type 1 does not have to be a dedicated physical vault and could be another storage array within the production data center. This type is the default for zCPA.
zCPA Type 2
- Type 2 is an air-gapped environment between two storage environments. The data copied to the cyber vault is triggered by SRDF link online operation. GDDR monitors the SRDF Link Operation process to know when SRDF connectivity to the vault has been established and closed it when it has populated the vault.
zCPA Type 3
- Type 3 is an environment that does not provide an airgap solution. The data copied to the cyber vault is triggered by the SCHEDULE or INTERVAL parameter in GDDR.
The airgap support between the production and vault site is optional.
For more information about GDDR’s zCPA with respect to cyber, see the white paper Dell PowerMax: Cyber Security for Mainframe Storage or contact us at mainframe@dell.com.
Resources
- Mainframe Enablers TimeFinder SnapVX and zDP 8.5 Product Guide
- Data Protector for z Systems (zDP) Essentials White Paper
- Dell PowerMax: Cyber Security for Mainframe Storage
- GDDR (Geographically Dispersed Disaster Restart) for PowerMax 8000 and VMAX ALL FLASH 950F
Author: Justin F. Bastin
Related Blog Posts
Using Snapshot Policies to Mitigate Ransomware Attacks
Tue, 10 May 2022 21:33:41 -0000
|Read Time: 0 minutes
Cyber security remains a priority for organizations. A cyber or ransomware attack occurs every 11 seconds1, causing organizations to continually implement security requirements in order to safeguard mission critical and sensitive data. There is an important need not only to protect this data but have the ability to recover and restore data in the event of a ransomware attack. PowerMax SnapVX Snapshots are a powerful tool to help protect, recover, and restore in the event of a cyber-attack.
SnapVX provides space saving and efficient local replication in PowerMax arrays. SnapVX snapshots are a pointer-based structure that preserves a point-in-time view of a source volume. Snapshots provide the ability to manage consistent point-in-time copies for storage groups. Host accessible target volumes can be linked if a point-in-time snapshot needs to be accessed without affecting the point-in-time of the source.
SnapVX snapshots can be set as secure snaps. Secure snaps are snapshots that cannot be deleted, either accidentally or intentionally. They are retained in resource-limited situations in which conventional snapshots are placed in a failed state to release resources.
SnapVX snapshot users can take advantage of automated scheduling using Snapshot Policies. Snapshot Policies are customizable with rules that specify when to take snapshots, how many to take, how long to keep them, and whether they are standard or secure snaps.
The following is an example snapshot policy dashboard:
SnapVX snapshots with Snapshot Policies allows for 1024 snapshots per source device and 65 million per PowerMax array. Users can take advantage of the frequency and large snapshot scale in policy-driven snapshots to provide enhanced data resiliency.
Because secure snaps cannot be maliciously or accidentally deleted prior to any planned expiration date, they can be leveraged for organizations to preserve multiple point in time copies that can be recovered from, in the event of a malware or ransomware attack. Snapshot policies can be automated to take secure snaps with a high frequency and a short retention duration for fine granularity, with a lower frequency and longer retention for added security, or a mixture of both. If an attack occurs, the user can review the secure snaps to determine which point in time has the most relevant and up to date copy of data without malware impact. When the precise point in time is identified, restoring critical data can be done almost instantaneously by bringing application data back to the original state prior to any attack.
Secure snaps also provide an additional layer of security in the case of multiple attacks and can be used for forensic work to help determine what happened during the attack and when it originally occurred. With the lower frequency and longer retention period, secure snaps can be used to validate data and data change rate to help identify any suspicious activity.
The following figure provides an example of creating secure snaps with snapshot policies:
Traditional snapshots can be set with a policy to take snapshots at a frequency and retention that works best for the organization. These snapshots can be used for daily business continuity, such as development, operations, and data analytics. They can also assist in any forensic analysis and can be compared against secure snaps to help determine what changed and when it started to change. Unlike secure snaps, traditional snapshots can be deleted or fail in array resource constraint situations. However, the data on an existing snapshot cannot be changed and could be used for additional recovery options.
Both secure and traditional snaps are a power tool for organizations to leverage to help protect and restore data rapidly, to minimize any impact of a malware or ransomware attack. The large scalability of snapshots can be easily managed using Snapshot policies for scheduling frequency and retention time duration to fit any size organization.
The following is an operational example of the frequency, retention, and scale out of the value of SnapVX secure snaps. The numbers are based on an average of 5000 production volumes in a PowerMax array.
- Secure snaps every 10 minutes with a 48-hour retention
- 288 per volume point-in-time copies
- Fine grain protection and recovery
- Secure snaps every 60 minutes with a 7-day retention
- 168 per volume point-in-time copies
- Extended protection and data validation
Total of 2,040,000 secure point-in-time copies
The flexible and scalable features for PowerMax SnapVX traditional and secure snapshots are powerful tools for protecting against ransomware attacks.
Resources
- Dell PowerMax: Cyber Security
- Dell PowerMax: Cyber Security for Mainframe Storage
- Dell PowerMax and VMAX All Flash: TimeFinder SnapVX Local Replication
- Dell PowerMax and VMAX All Flash: Snapshot Policies
1 "Cybercrime To Cost The World $10.5 Trillion Annually By 2025," by Cybercrime Magazine, November 2020.
Author: Richard Pace, PowerMax Engineering Technologist
Twitter: @pace_rich
Cyber Intrusion Detection for z Systems (zCID)
Tue, 12 Dec 2023 18:42:09 -0000
|Read Time: 0 minutes
Any cyber security event can have a devastating impact on a company’s financials. Stolen credit cards, identity theft, hacked emails, and so on hurt both the customer and the company’s brand, even going so far as to potentially ruin that company. Data Recovery takes time, but rebuilding customer trust may take even longer.
Dell Technologies has made major investments in a series of continuous security product enhancements to help protect companies and their end users from data loss and/or compromise in the event of an attack. Whether it’s an attack on open systems data or mainframe data, the result of any attack is the same: loss of productivity and concern over theft and exposure of sensitive information.
Ideally, technologies like storage should be able to detect a cyber threat, protect data from the threat, and, in the event of a loss or corruption of data, recover to a known good point. Eight years ago, Dell Technologies developed the first snapshot-based recovery capability for mainframe and open systems data and, as of the latest release of PowerMax in October 2023, has moved into the “intrusion detection” realm of cyber resiliency.
This blog is about a new enhancement to our Mainframe Enabler Software for PowerMax that is designed to provide advanced threat detection for PowerMax mainframe environments.
Mainframe Enabler for intrusion detection
Mainframe Enabler Software (MFE) runs on a z/OS LPAR and is designed to manage PowerMax 2500/8500 and 8000. During discussions about the most recent customer requirements for this release of MFE, it became apparent that customers urgently needed a way to determine whether a cyber event was imminent or occurring. The ask was to send the equivalent of a ‘flare in the sky’ to single-out any atypical behavior in mainframe data access. Upon learning of zCID’s capability within the larger Dell cyber solution, a large mainframe service provider commented “Dell’s innovation around detection of cyber events within PowerMax and CloudIQ is ahead of any other storage provider we talked to”.
Dell Mainframe Solutions development, Product Management, and other organizations within Dell designed a way to enhance MFE to provide awareness of atypical data access behavior. The result of that work was delivered as an enhancement in MFE 10.1.0, released 17 October 2023. This enhancement is known as ‘Cyber Intrusion Detection for z Systems’ or zCID for short.
We will jump into the technical details of zCID; but first, let’s cover the What, Why, and How of this valuable new feature.
What: zCID is a utility that detects atypical data access patterns in mainframe workloads.
Why: To warn PowerMax mainframe customers that atypical access is occurring, and which should be investigated if a cyber intrusion is suspected.
How: zCID monitors the number of unique tracks accessed for mainframe CKD devices and SMS groups within a customer specified time interval. First a baseline of “normal/typical” access is confirmed by the storage administrator. The next step is to create a set of rules for warning statements that will be generated if an anomaly was detected when data was accessed. Next, zCID is started and runs continually in the background. Finally, if an intrusion is suspected, zCID raw data can be converted to a CSV format for detailed analysis.
Technical and install requirements for zCID
The minimum technical requirements for zCID are:
- MFE 10.1.0 with available SCF address space
- PowerMax 8000, 2500, or 8500
- A list of CKD volumes or SMS groups to monitor
Customers must APF-authorize the MFE 10.1.0 LINKLIB dataset and add a STEPLIB DD statement in their zCID batch jobs. (zCID can also run as a started task.)
zCID is delivered in two programs:
- ECTRAARD is the zCID utility program
- ECTREXTR is a zCID program that converts the raw zCID data to a CSV file. This CSV file is intended to be imported into Microsoft Excel for additional analysis and reporting as determined by a storage analyst.
zCID modes of operation and high-level implementation strategy
ECTRAARD can run in “Live Run mode” or “Batch Run mode”. It is important to understand these two modes before deploying zCID:
- Live Run mode: processes data in real time and collects data from the resources you tell zCID to monitor.
- Batch Run mode: takes the output produced in Live Run mode and creates reports about the historical information.
To maximize the benefits of zCID, follow these five-steps:
- Live Run mode will vary from customer to customer. Typically, you would run zCID in Live Run mode to capture access rates for the z/OS resources you are monitoring. Typically, I would start Live Run mode for one week (seven days), then capture a month end batch processing cycle, and ideally, a quarterly and year end closing cycle. With that information, you can calibrate your WARN statements for your highest accessed rate z/OS workloads that zCID is monitoring.
- Run zCID in Live Run mode over a “long” period. View this period of time as an opportunity to collect access rate information for the z/OS resources that zCID is monitoring. In the future, you can use this information to test your warning statements for atypical access rates on monitored z/OS resources.
- Stop Live Run mode at the end of the “long period" so that the datasets zCID was building can be closed.
- Run zCID Batch Mode to create reports, then analyze the results.
- Create warning statements for the atypical access rates for which you want to be notified. To calibrate the warning statements, take the datasets created in Step 2 and run zCID in Batch Run mode. Are zCID warning messages being issued from the warning statements you created?
Calibrate the WARN statements ensures that z/OS SYSLOG, z/OS master console, and z/OS zCID started tasks are not spammed with zCID warning messages. - Restart zCID in Live Run mode with the calibrated warning control statements.
zCID will now actively monitor the z/OS resources you provided and generate an alert every time an atypical access rate occurs!
Summary
Cyber Intrusion Detection for z Systems (zCID) makes Dell PowerMax the industry’s first intrusion detection mechanism for on-array mainframe storage [1]. zCID is a layer of intelligence that detects atypical data access patterns for specified workloads by providing for first-time PowerMax customers insight into their z/OS workloads’ access rates. Customers can then automate the monitoring of those workloads with the goal of detecting cyber events within their mainframe storage infrastructure.
Check out https://infohub.delltechnologies.com/ for more information about zCID and Dell’s PowerMax mainframe solutions.
Author: Justin Bastin, Senior Principal Engineer
[1] Based on Dell's internal analysis comparing PowerMax 2500/8500 cyber detection for mainframe storage to mainstream mainframe competitors. August 2023.