Unstructured Data Quick Tips - OneFS Protection Overhead
Wed, 08 Sep 2021 20:40:29 -0000
|Read Time: 0 minutes
There have been several questions from the field recently about how to calculate the OneFS storage protection overhead for different cluster sizes and protection levels. But first, a quick overview of the fundamentals…
OneFS supports several protection schemes. These include the ubiquitous +2d:1n, which protects against two drive failures or one node failure. The best practice is to use the recommended protection level for a particular cluster configuration. This recommended level of protection is clearly marked as ‘suggested’ in the OneFS WebUI storage pools configuration pages and is typically configured by default. For all current Gen6 hardware configurations, the recommended protection level is “+2d:1n”.
The hybrid protection schemes are particularly useful for Gen6 chassis high-density node configurations, where the probability of multiple drives failing far surpasses that of an entire node failure. In the unlikely event that multiple devices have simultaneously failed, such that the file is “beyond its protection level”, OneFS will re-protect everything possible and report errors on the individual files affected to the cluster’s logs.
OneFS also provides a variety of mirroring options ranging from 2x to 8x, allowing from two to eight mirrors of the specified content. Metadata, for example, is mirrored at one level above FEC (forward error correction) by default. For example, if a file is protected at +2n, its associated metadata object will be 3x mirrored.
The full range of OneFS protection levels are as follows:
Protection Level | Description |
+1n | Tolerate failure of 1 drive OR 1 node |
+2d:1n | Tolerate failure of 2 drives OR 1 node |
+2n | Tolerate failure of 2 drives OR 2 nodes |
+3d:1n | Tolerate failure of 3 drives OR 1 node |
+3d:1n1d | Tolerate failure of 3 drives OR 1 node AND 1 drive |
+3n | Tolerate failure of 3 drives or 3 nodes |
+4d:1n | Tolerate failure of 4 drives or 1 node |
+4d:2n | Tolerate failure of 4 drives or 2 nodes |
+4n | Tolerate failure of 4 nodes |
2x to 8x | Mirrored over 2 to 8 nodes, depending on configuration |
The charts below show the ‘ideal’ protection overhead across the range of node counts and OneFS protection levels (noted within brackets). For each field in this chart, the overhead percentage is calculated by dividing the sum of the two numbers by the number on the right.
x+y => y/(x+y)
So, for a 5-node cluster protected at +2d:1n, OneFS uses an 8+2 layout – hence an ‘ideal’ overhead of 20%.
8+2 => 2/(8+2) = 20%
Number of nodes | [+1n] | [+2d:1n] | [+2n] | [+3d:1n] | [+3d:1n1d] | [+3n] | [+4d:1n] | [+4d:2n] | [+4n] |
3 | 2 +1 (33%) | 4 + 2 (33%) | — | 6 + 3 (33%) | 3 + 3 (50%) | — | 8 + 4 (33%) | — | — |
4 | 3 +1 (25%) | 6 + 2 (25%) | — | 9 + 3 (25%) | 5 + 3 (38%) | — | 12 + 4 (25%) | 4 + 4 (50%) | — |
5 | 4 +1 (20%) | 8+ 2 (20%) | 3 + 2 (40%) | 12 + 3 (20%) | 7 + 3 (30%) | — | 16 + 4 (20%) | 6 + 4 (40%) | — |
6 | 5 +1 (17%) | 10 + 2 (17%) | 4 + 2 (33%) | 15 + 3 (17%) | 9 + 3 (25%) | — | 16 + 4 (20%) | 8 + 4 (33%) | — |
The ‘x+y’ numbers in each field in the table also represent how files are striped across a cluster for each node count and protection level.
Take for example, with +2n protection on a 6-node cluster, OneFS will write a stripe across all 6 nodes, and use two of the stripe units for parity/ECC and four for data.
In general, for FEC protected data the OneFS protection overhead will look something like below.
Note that the protection overhead % (in brackets) is a very rough guide and will vary across different datasets, depending on quantities of small files, and so on.
Number of nodes | [+1n] | [+2d:1n] | [+2n] | [+3d:1n] | [+3d:1n1d] | [+3n] | [+4d:1n] | [+4d:2n] | [+4n] |
3 | 2 +1 (33%) | 4 + 2 (33%) | — | 6 + 3 (33%) | 3 + 3 (50%) | — | 8 + 4 (33%) | — | — |
4 | 3 +1 (25%) | 6 + 2 (25%) | — | 9 + 3 (25%) | 5 + 3 (38%) | — | 12 + 4 (25%) | 4 + 4 (50%) | — |
5 | 4 +1 (20%) | 8 + 2 (20%) | 3 + 2 (40%) | 12 + 3 (20%) | 7 + 3 (30%) | — | 16 + 4 (20%) | 6 + 4 (40%) | — |
6 | 5 +1 (17%) | 10 + 2 (17%) | 4 + 2 (33%) | 15 + 3 (17%) | 9 + 3 (25%) | — | 16 + 4 (20%) | 8 + 4 (33%) | — |
7 | 6 +1 (14%) | 12 + 2 (14%) | 5 + 2 (29%) | 15 + 3 (17%) | 11 + 3 (21%) | 4 + 3 (43%) | 16 + 4 (20%) | 10 + 4 (29%) | — |
8 | 7 +1 (13%) | 14 + 2 (12.5%) | 6 + 2 (25%) | 15 + 3 (17%) | 13 + 3 (19%) | 5 + 3 (38%) | 16 + 4 (20%) | 12 + 4 (25%) | — |
9 | 8 +1 (11%) | 16 + 2 (11%) | 7 + 2 (22%) | 15 + 3 (17%) | 15 + 3 (17%) | 6 + 3 (33%) | 16 + 4 (20%) | 14 + 4 (22%) | 5 + 4 (44%) |
10 | 9 +1 (10%) | 16 + 2 (11%) | 8 + 2 (20%) | 15 + 3 (17%) | 15 + 3 (17%) | 7 + 3 (30%) | 16 + 4 (20%) | 16 + 4 (20%) | 6 + 4 (40%) |
12 | 11 +1 (8%) | 16 + 2 (11%) | 10 + 2 (17%) | 15 + 3 (17%) | 15 + 3 (17%) | 9 + 3 (25%) | 16 + 4 (20%) | 16 + 4 (20%) | 6 + 4 (40%) |
14 | 13 +1 (7%) | 16 + 2 (11%) | 12 + 2 (14%) | 15 + 3 (17%) | 15 + 3 (17%) | 11 + 3 (21%) | 16 + 4 (20%) | 16 + 4 (20%) | 10 + 4 (29%) |
16 | 15 +1 (6%) | 16 + 2 (11%) | 14 + 2 (13%) | 15 + 3 (17%) | 15 + 3 (17%) | 13 + 3 (19%) | 16 + 4 (20%) | 16 + 4 (20%) | 12 + 4 (25%) |
18 | 16 +1 (6%) | 16 + 2 (11%) | 16 + 2 (11%) | 15 + 3 (17%) | 15 + 3 (17%) | 15 + 3 (17%) | 16 + 4 (20%) | 16 + 4 (20%) | 14 + 4 (22%) |
20 | 16 +1 (6%) | 16 + 2 (11%) | 16 + 2 (11%) | 16 + 3 (16%) | 16 + 3 (16%) | 16 + 3 (16%) | 16 + 4 (20%) | 16 + 4 (20%) | 14 + 4 (22%) |
30 | 16 +1 (6%) | 16 + 2 (11%) | 16 + 2 (11%) | 16 + 3 (16%) | 16 + 3 (16%) | 16 + 3 (16%) | 16 + 4 (20%) | 16 + 4 (20%) | 14 + 4 (22%) |
The protection level of the file is how the system decides to layout the file. A file may have multiple protection levels temporarily (because the file is being restriped) or permanently (because of a heterogeneous cluster). The protection level is specified as “n + m/b@r” in its full form. In the case where b, r, or both equal 1, it may be shortened to get “n + m/b”, “n + m@r”, or “n + m”.
Layout Attribute | Description |
N | Number of data drives in a stripe. |
+m | Number of FEC drives in a stripe. |
/b | Number of drives per stripe allowed on one node. |
@r | Number of drives per node to include in a file. |
The OneFS protection definition in terms of node and/or drive failures has the advantage of configuration simplicity. However, it does mask some of the subtlety of the interaction between stripe width and drive spread, as represented by the n+m/b notation displayed by the ‘isi get’ CLI command. For example:
# isi get README.txt POLICY LEVEL PERFORMANCE COAL FILE default 6+2/2 concurrency on README.txt
In particular, both +3/3 and +3/2 allow for a single node failure or three drive failures and appear the same according to the web terminology. Despite this, they do in fact have different characteristics. +3/2 allows for the failure of any one node in combination with the failure of a single drive on any other node, which +3/3 does not. +3/3, on the other hand, allows for potentially better space efficiency and performance because up to three drives per node can be used, rather than the 2 allowed under +3/2.
Another factor to keep in mind is OneFS neighborhoods. A neighborhood is a fault domain within a node pool. The purpose of neighborhoods is to improve reliability in general – and guard against data unavailability from the accidental removal of Gen6 drive sleds. For self-contained nodes like the PowerScale F200, OneFS has an ideal size of 20 nodes per node pool, and a maximum size of 39 nodes. On the addition of the 40th node, the nodes split into two neighborhoods of 20 nodes.
With the Gen6 platform, the ideal size of a neighborhood changes from 20 to 10 nodes. It also means that a Gen6 nodes pool will never reach the large stripe width (for example 16+3) since the pool will have already split.
This 10-node ideal neighborhood size helps protect the Gen6 architecture against simultaneous node-pair journal failures and full chassis failures. Partner nodes are nodes whose journals are mirrored. Rather than each node storing its journal in NVRAM as in the PowerScale platforms, the Gen6 nodes’ journals are stored on SSDs – and every journal has a mirror copy on another node. The node that contains the mirrored journal is referred to as the partner node.
There are several reliability benefits gained from the changes to the journal. For example, SSDs are more persistent and reliable than NVRAM, which requires a charged battery to retain state. Also, with the mirrored journal, both journal drives have to die before a journal is considered lost. As such, unless both of the mirrored journal drives fail, both of the partner nodes can function as normal.
With partner node protection, where possible, nodes will be placed in different neighborhoods – and hence different failure domains. Partner node protection is possible once the cluster reaches five full chassis (20 nodes) when, after the first neighborhood split, OneFS places partner nodes in different neighborhoods:
Partner node protection increases reliability because if both nodes go down, they are in different failure domains, so their failure domains only suffer the loss of a single node.
With chassis protection, when possible, each of the four nodes within a chassis will be placed in a separate neighborhood. Chassis protection becomes possible at 40 nodes, as the neighborhood split at 40 nodes enables every node in a chassis to be placed in a different neighborhood. As such, when a 38 node Gen6 cluster is expanded to 40 nodes, the two existing neighborhoods will be split into four 10-node neighborhoods:
Chassis protection ensures that if an entire chassis failed, each failure domain would only lose one node.
Author: Nick Trimbee
Related Blog Posts
PowerScale OneFS 9.7
Wed, 13 Dec 2023 13:55:00 -0000
|Read Time: 0 minutes
Dell PowerScale is already powering up the holiday season with the launch of the innovative OneFS 9.7 release, which shipped today (13th December 2023). This new 9.7 release is an all-rounder, introducing PowerScale innovations in Cloud, Performance, Security, and ease of use.
After the debut of APEX File Storage for AWS earlier this year, OneFS 9.7 extends and simplifies the PowerScale in the public cloud offering, delivering more features on more instance types across more regions.
In addition to providing the same OneFS software platform on-prem and in the cloud, and customer-managed for full control, APEX File Storage for AWS in OneFS 9.7 sees a 60% capacity increase, providing linear capacity and performance scaling up to six SSD nodes and 1.6 PiB per namespace/cluster, and up to 10GB/s reads and 4GB/s writes per cluster. This can make it a solid fit for traditional file shares and home directories, vertical workloads like M&E, healthcare, life sciences, finserv, and next-gen AI, ML and analytics applications.
Enhancements to APEX File Storage for AWS
PowerScale’s scale-out architecture can be deployed on customer managed AWS EBS and ECS infrastructure, providing the scale and performance needed to run a variety of unstructured workflows in the public cloud. Plus, OneFS 9.7 provides an ‘easy button’ for streamlined AWS infrastructure provisioning and deployment.
Once in the cloud, you can further leverage existing PowerScale investments by accessing and orchestrating your data through the platform's multi-protocol access and APIs.
This includes the common OneFS control plane (CLI, WebUI, and platform API), and the same enterprise features: Multi-protocol, SnapshotIQ, SmartQuotas, Identity management, and so on.
With OneFS 9.7, APEX File Storage for AWS also sees the addition of support for HDFS and FTP protocols, in addition to NFS, SMB, and S3. Granular performance prioritization and throttling is also enabled with SmartQoS, allowing admins to configure limits on the maximum number of protocol operations that NFS, S3, SMB, or mixed protocol workloads can consume on an APEX File Storage for AWS cluster.
Security
With data integrity and protection being top of mind in this era of unprecedented cyber threats, OneFS 9.7 brings a bevy of new features and functionality to keep your unstructured data and workloads more secure than ever. These new OneFS 9.7 security enhancements help address US Federal and DoD mandates, such as FIPS 140-2 and DISA STIGs – in addition to general enterprise data security requirements. Included in the new OneFS 9.7 release is a simple cluster configuration backup and restore utility, address space layout randomization, and single sign-on (SSO) lookup enhancements.
Data mobility
On the data replication front, SmartSync sees the introduction of GCP as an object storage target in OneFS 9.7, in addition to ECS, AWS and Azure. The SmartSync data mover allows flexible data movement and copying, incremental resyncs, push and pull data transfer, and one-time file to object copy.
Performance improvements
Building on the streaming read performance delivered in a prior release, OneFS 9.7 also unlocks dramatic write performance enhancements, particularly for the all-flash NVMe platforms - plus infrastructure support for future node hardware platform generations. A sizable boost in throughput to a single client helps deliver performance for the most demanding GenAI workloads, particularly for the model training and inferencing phases. Additionally, the scale-out cluster architecture enables performance to scale linearly as GPUs are increased, allowing PowerScale to easily support AI workflows from small to large.
Cluster support for InsightIQ 5.0
The new InsightIQ 5.0 software expands PowerScale monitoring capabilities, including a new user interface, automated email alerts, and added security. InsightIQ 5.0 is available today for all existing and new PowerScale customers at no additional charge. These innovations are designed to simplify management, expand scale and security, and automate operations for PowerScale performance monitoring for AI, GenAI, and all other workloads.
In summary, OneFS 9.7 brings the following new features and functionality to the Dell PowerScale ecosystem:
We’ll be taking a deeper look at these new features and functionality in blog articles over the course of the next few weeks.
Meanwhile, the new OneFS 9.7 code is available on the Dell Support site, as both an upgrade and reimage file, allowing both installation and upgrade of this new release.
OneFS SSL Certificate Renewal – Part 1
Thu, 16 Nov 2023 04:57:00 -0000
|Read Time: 0 minutes
When using either the OneFS WebUI or platform API (pAPI), all communication sessions are encrypted using SSL (Secure Sockets Layer), also known as Transport Layer Security (TLS). In this series, we will look at how to replace or renew the SSL certificate for the OneFS WebUI.
SSL requires a certificate that serves two principal functions: It grants permission to use encrypted communication using Public Key Infrastructure and authenticates the identity of the certificate’s holder.
Architecturally, SSL consists of four fundamental components:
SSL Component | Description |
Alert | Reports issues. |
Change cipher spec | Implements negotiated crypto parameters. |
Handshake | Negotiates crypto parameters for SSL session. Can be used for many SSL/TCP connections. |
Record | Provides encryption and MAC. |
These sit in the stack as follows:
The basic handshake process begins with a client requesting an HTTPS WebUI session to the cluster. OneFS then returns the SSL certificate and public key. The client creates a session key, encrypted with the public key it is received from OneFS. At this point, the client only knows the session key. The client now sends its encrypted session key to the cluster, which decrypts it with the private key. Now, both the client and OneFS know the session key. So, finally, the session, encrypted using a symmetric session key, can be established. OneFS automatically defaults to the best supported version of SSL, based on the client request.
A PowerScale cluster initially contains a self-signed certificate, which can be used as-is or replaced with a third-party certificate authority (CA)-issued certificate. If the self-signed certificate is used upon expiry, it must be replaced with either a third-party (public or private) CA-issued certificate or another self-signed certificate that is generated on the cluster. The following are the default locations for the server.crt and server.key files.
File | Location |
SSL certificate | /usr/local/apache2/conf/ssl.crt/server.crt |
SSL certificate key | /usr/local/apache2/conf/ssl.key/server.key |
The ‘isi certificate settings view’ CLI command displays all of the certificate-related configuration options. For example:
# isi certificate settings view Certificate Monitor Enabled: Yes Certificate Pre Expiration Threshold: 4W2D Default HTTPS Certificate ID: default Subject: C=US, ST=Washington, L=Seattle, O="Isilon", OU=Isilon, CN=Dell, emailAddress=tme@isilon.com Status: valid |
The above ‘certificate monitor enabled’ and ‘certificate pre expiration threshold’ configuration options govern a nightly cron job, which monitors the expiration of each managed certificate and fires a CELOG alert if a certificate is set to expire within the configured threshold. Note that the default expiration is 30 days (4W2D, which represents 4 weeks plus 2 days). The ‘ID: default’ configuration option indicates that this certificate is the default TLS certificate.
The basic certificate renewal or creation flow is as follows:
The steps below include options to complete a self-signed certificate replacement or renewal, or to request an SSL replacement or renewal from a Certificate Authority (CA).
Backing up the existing SSL certificate
The first task is to obtain the list of certificates by running the following CLI command, and identify the appropriate one to renew:
# isi certificate server list ID Name Status Expires ------------------------------------------- eb0703b default valid 2025-10-11T10:45:52 ------------------------------------------- |
It’s always a prudent practice to save a backup of the original certificate and key. This can be easily accomplished using the following CLI commands, which, in this case, create the directory ‘/ifs/data/ssl_bkup’ directory, set the perms to root-only access, and copy the original key and certificate to it:
# mkdir -p /ifs/data/ssl_bkup # chmod 700 /ifs/data/ssl_bkup # cp /usr/local/apache24/conf/ssl.crt/server.crt /ifs/data/ssl_bkup # cp /usr/local/apache24/conf/ssl.key/server.key /ifs/data/ssl_bkup # cd !$ cd /ifs/data/ssl_bkup # ls server.crt server.key |
Renewing or creating a certificate
The next step in the process involves either the renewal of an existing certificate or creation of a certificate from scratch. In either case, first, create a temporary directory, for example /ifs/tmp:
# mkdir /ifs/tmp; cd /ifs/tmp |
a) Renew an existing self-signed Certificate.
The following syntax creates a renewal certificate based on the existing ssl.key. The value of the ‘-days’ parameter can be adjusted to generate a certificate with the wanted expiration date. For example, the following command will create a one-year certificate.
# cp /usr/local/apache2/conf/ssl.key/server.key ./ ; openssl req -new -days 365 -nodes -x509 -key server.key -out server.crt |
Answer the system prompts to complete the self-signed SSL certificate generation process, entering the pertinent information location and contact information. For example:
Country Name (2 letter code) [AU]:US
State or Province Name (full name) [Some-State]:Washington
Locality Name (eg, city) []:Seattle
Organization Name (eg, company) [Internet Widgits Pty Ltd]:Isilon
Organizational Unit Name (eg, section) []:TME
Common Name (e.g. server FQDN or YOUR name) []:isilon.com
Email Address []:tme@isilon.com
When all the information has been successfully entered, the server.csr and server.key files will be generated under the /ifs/tmp directory.
Optionally, the attributes and integrity of the certificate can be verified with the following syntax:
# openssl x509 -text -noout -in server.crt |
Next, proceed directly to the ‘Add the certificate to the cluster’ steps in section 4 of this article.
b) Alternatively, a certificate and key can be generated from scratch, if preferred.
The following CLI command can be used to create an 2048-bit RSA private key:
# openssl genrsa -out server.key 2048 Generating RSA private key, 2048 bit long modulus ............+++++
...........................................................+++++
e is 65537 (0x10001) |
Next, create a certificate signing request:
# openssl req -new -nodes -key server.key -out server.csr |
For example:
# openssl req -new -nodes -key server.key -out server.csr -reqexts SAN -config <(cat /etc/ssl/openssl.cnf <(printf "[SAN]\nsubjectAltName=DNS:isilon.com")) You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:WA Locality Name (eg, city) []:Seattle Organization Name (eg, company) [Internet Widgits Pty Ltd]:Isilon Organizational Unit Name (eg, section) []:TME Common Name (e.g. server FQDN or YOUR name) []:h7001 Email Address []:tme@isilon.com Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []:1234 An optional company name []: # |
Answer the system prompts to complete the self-signed SSL certificate generation process, entering the pertinent information location and contact information. Additionally, a ‘challenge password’ with a minimum of 4-bytes in length will need to be selected and entered.
As prompted, enter the information to be incorporated into the certificate request. When completed, the server.csr and server.key files will appear in the /ifs/tmp directory.
If wanted, a CSR file for a Certificate Authority, which includes Subject-Alternative-Names (SAN) can be generated. For example, additional host name entries can be added using a comma (IE. DNS:isilon.com,DNS:www.isilon.com).
In the next article, we will look at the certificate singing, addition, and verification steps of the process.