OneFS Snapshot Security
Fri, 21 Apr 2023 17:11:00 -0000
|Read Time: 0 minutes
In this era of elevated cyber-crime and data security threats, there is increasing demand for immutable, tamper-proof snapshots. Often this need arises as part of a broader security mandate, ideally proactively, but oftentimes as a response to a security incident. OneFS addresses this requirement in the following ways:
On-cluster | Off-cluster |
|
|
Read-only snapshots
At its core, OneFS SnapshotIQ generates read-only, point-in-time, space efficient copies of a defined subset of a cluster’s data.
Only the changed blocks of a file are stored when updating OneFS snapshots, ensuring efficient storage utilization. They are also highly scalable and typically take less than a second to create, while generating little performance overhead. As such, the RPO (recovery point objective) and RTO (recovery time objective) of a OneFS snapshot can be very small and highly flexible, with the use of rich policies and schedules.
OneFS Snapshots are created manually, on a schedule, or automatically generated by OneFS to facilitate system operations. But whatever the generation method, when a snapshot has been taken, its contents cannot be manually altered.
Snapshot Locks
In addition to snapshot contents immutability, for an enhanced level of tamper-proofing, SnapshotIQ also provides the ability to lock snapshots with the ‘isi snapshot locks’ CLI syntax. This prevents snapshots from being accidentally or unintentionally deleted.
For example, a manual snapshot, ‘snaploc1’ is taken of /ifs/test:
# isi snapshot snapshots create /ifs/test --name snaploc1 # isi snapshot snapshots list | grep snaploc1 79188 snaploc1 /ifs/test
A lock is then placed on it (in this case lock ID=1):
# isi snapshot locks create snaplock1 # isi snapshot locks list snaploc1 ID ---- 1 ---- Total: 1
Attempts to delete the snapshot fail because the lock prevents its removal:
# isi snapshot snapshots delete snaploc1 Are you sure? (yes/[no]): yes Snapshot "snaploc1" can't be deleted because it is locked
The CLI command ‘isi snapshot locks delete <lock_ID>’ can be used to clear existing snapshot locks, if desired. For example, to remove the only lock (ID=1) from snapshot ‘snaploc1’:
# isi snapshot locks list snaploc1 ID ---- 1 ---- Total: 1 # isi snapshot locks delete snaploc1 1 Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes # isi snap locks view snaploc1 1 No such lock
When the lock is removed, the snapshot can then be deleted:
# isi snapshot snapshots delete snaploc1 Are you sure? (yes/[no]): yes # isi snapshot snapshots list| grep -i snaploc1 | wc -l 0
Note that a snapshot can have up to a maximum of sixteen locks on it at any time. Also, lock numbers are continually incremented and not recycled upon deletion.
Like snapshot expiration, snapshot locks can also have an expiration time configured. For example, to set a lock on snapshot ‘snaploc1’ that expires at 1am on April 1st, 2024:
# isi snap lock create snaploc1 --expires '2024-04-01T01:00:00' # isi snap lock list snaploc1 ID ---- 36 ---- Total: 1 # isi snap lock view snaploc1 33 ID: 36 Comment: Expires: 2024-04-01T01:00:00 Count: 1
Note that if the duration period of a particular snapshot lock expires but others remain, OneFS will not delete that snapshot until all the locks on it have been deleted or expired.
The following table provides an example snapshot expiration schedule, with monthly locked snapshots to prevent deletion:
Snapshot Frequency | Snapshot Time | Snapshot Expiration | Max Retained Snapshots |
Every other hour | Start at 12:00AM End at 11:59AM | 1 day | 27 |
Every day | At 12:00AM | 1 week | |
Every week | Saturday at 12:00AM | 1 month | |
Every month | First Saturday of month at 12:00AM | Locked |
Roles-based Access Control
Read-only snapshots plus locks present physically secure snapshots on a cluster. However, if you can login to the cluster and have the required elevated administrator privileges to do so, you can still remove locks and/or delete snapshots.
Because data security threats come from inside an environment as well as out, such as from a disgruntled IT employee or other internal bad actor, another key to a robust security profile is to constrain the use of all-powerful ‘root’, ‘administrator’, and ‘sudo’ accounts as much as possible. Instead, of granting cluster admins full rights, a preferred security best practice is to leverage the comprehensive authentication, authorization, and accounting framework that OneFS natively provides.
OneFS role-based access control (RBAC) can be used to explicitly limit who has access to manage and delete snapshots. This granular control allows you to craft administrative roles that can create and manage snapshot schedules, but prevent their unlocking and/or deletion. Similarly, lock removal and snapshot deletion can be isolated to a specific security role (or to root only).
A cluster security administrator selects the desired access zone, creates a zone-aware role within it, assigns privileges, and then assigns members.
For example, from the WebUI under Access > Membership and roles > Roles:
When these members access the cluster through the WebUI, PlatformAPI, or CLI, they inherit their assigned privileges.
The specific privileges that can be used to segment OneFS snapshot management include:
Privilege | Description |
ISI_PRIV_SNAPSHOT_ALIAS | Aliasing for snapshots |
ISI_PRIV_SNAPSHOT_LOCKS | Locking of snapshots from deletion |
ISI_PRIV_SNAPSHOT_PENDING | Upcoming snapshot based on schedules |
ISI_PRIV_SNAPSHOT_RESTORE | Restoring directory to a particular snapshot |
ISI_PRIV_SNAPSHOT_SCHEDULES | Scheduling for periodic snapshots |
ISI_PRIV_SNAPSHOT_SETTING | Service and access settings |
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT | Manual snapshots and locks |
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY | Snapshot summary and usage details |
Each privilege can be assigned one of four permission levels for a role, including:
Permission Indicator | Description |
– | No permission |
R | Read-only permission |
X | Execute permission |
W | Write permission |
The ability for a user to delete a snapshot is governed by the ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ privilege. Similarly, the ‘ISI_PRIV_SNAPSHOT_LOCKS’ privilege governs lock creation and removal.
In the following example, the ‘snap’ role has ‘read’ rights for the ‘ISI_PRIV_SNAPSHOT_LOCKS’ privilege, allowing a user associated with this role to view snapshot locks:
# isi auth roles view snap | grep -I -A 1 locks ID: ISI_PRIV_SNAPSHOT_LOCKS Permission: r -- # isi snapshot locks list snaploc1 ID ---- 1 ---- Total: 1
However, attempts to remove the lock ‘ID 1’ from the ‘snaploc1’ snapshot fail without write privileges:
# isi snapshot locks delete snaploc1 1 Privilege check failed. The following write privilege is required: Snapshot locks (ISI_PRIV_SNAPSHOT_LOCKS)
Write privileges are added to ‘ISI_PRIV_SNAPSHOT_LOCKS’ in the ‘’snaploc1’ role:
# isi auth roles modify snap –-add-priv-write ISI_PRIV_SNAPSHOT_LOCKS # isi auth roles view snap | grep -I -A 1 locks ID: ISI_PRIV_SNAPSHOT_LOCKS Permission: w --
This allows the lock ‘ID 1’ to be successfully deleted from the ‘snaploc1’ snapshot:
# isi snapshot locks delete snaploc1 1 Are you sure you want to delete snapshot lock 1 from snaploc1? (yes/[no]): yes # isi snap locks view snaploc1 1 No such lock
Using OneFS RBAC, an enhanced security approach for a site could be to create three OneFS roles on a cluster, each with an increasing realm of trust:
1. First, an IT ops/helpdesk role with ‘read’ access to the snapshot attributes would permit monitoring and troubleshooting, but no changes:
Snapshot Privilege | Description |
ISI_PRIV_SNAPSHOT_ALIAS | Read |
ISI_PRIV_SNAPSHOT_LOCKS | Read |
ISI_PRIV_SNAPSHOT_PENDING | Read |
ISI_PRIV_SNAPSHOT_RESTORE | Read |
ISI_PRIV_SNAPSHOT_SCHEDULES | Read |
ISI_PRIV_SNAPSHOT_SETTING | Read |
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT | Read |
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY | Read |
2. Next, a cluster admin role, with ‘read’ privileges for ‘ISI_PRIV_SNAPSHOT_LOCKS’ and ‘ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT’ would prevent snapshot and lock deletion, but provide ‘write’ access for schedule configuration, restores, and so on.
Snapshot Privilege | Description |
ISI_PRIV_SNAPSHOT_ALIAS | Write |
ISI_PRIV_SNAPSHOT_LOCKS | Read |
ISI_PRIV_SNAPSHOT_PENDING | Write |
ISI_PRIV_SNAPSHOT_RESTORE | Write |
ISI_PRIV_SNAPSHOT_SCHEDULES | Write |
ISI_PRIV_SNAPSHOT_SETTING | Write |
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT | Read |
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY | Write |
3. Finally, a cluster security admin role (root equivalence) would provide full snapshot configuration and management, lock control, and deletion rights:
Snapshot Privilege | Description |
ISI_PRIV_SNAPSHOT_ALIAS | Write |
ISI_PRIV_SNAPSHOT_LOCKS | Write |
ISI_PRIV_SNAPSHOT_PENDING | Write |
ISI_PRIV_SNAPSHOT_RESTORE | Write |
ISI_PRIV_SNAPSHOT_SCHEDULES | Write |
ISI_PRIV_SNAPSHOT_SETTING | Write |
ISI_PRIV_SNAPSHOT_SNAPSHOTMANAGEMENT | Write |
ISI_PRIV_SNAPSHOT_SNAPSHOT_SUMMARY | Write |
Note that when configuring OneFS RBAC, remember to remove the ‘ISI_PRIV_AUTH’ and ‘ISI_PRIV_ROLE’ privilege from all but the most trusted administrators.
Additionally, enterprise security management tools such as CyberArk can also be incorporated to manage authentication and access control holistically across an environment. These can be configured to frequently change passwords on trusted accounts (that is, every hour or so), require multi-Level approvals prior to retrieving passwords, and track and audit password requests and trends.
While this article focuses exclusively on OneFS snapshots, the expanded use of RBAC granular privileges for enhanced security is germane to most key areas of cluster management and data protection, such as SyncIQ replication, and so on.
Snapshot replication
In addition to using snapshots for its own checkpointing system, SyncIQ, the OneFS data replication engine, supports snapshot replication to a target cluster.
OneFS SyncIQ replication policies contain an option for triggering a replication policy when a snapshot of the source directory is completed. Additionally, at the onset of a new policy configuration, when the “Whenever a Snapshot of the Source Directory is Taken” option is selected, a checkbox appears to enable any existing snapshots in the source directory to be replicated. More information is available in this SyncIQ paper.
Cyber-vaulting
File data is arguably the most difficult to protect, because:
- It is the only type of data where potentially all employees have a direct connection to the storage (with the other type of storage it’s through an application)
- File data is linked (or mounted) to the operating system of the client. This means that it’s sufficient to gain file access to the OS to get access to potentially critical data.
- Users are the largest breach points.
The Cyber Security Framework (CSF) from the National Institute of Standards and Technology (NIST) categorizes the threat through recovery process:
Within the ‘Protect’ phase, there are two core aspects:
- Applying all the core protection features available on the OneFS platform, namely:
Feature | Description |
Access control | Where the core data protection functions are being executed. Assess who actually needs write access. |
Immutability | Having immutable snapshots, replica versions, and so on. Augmenting backup strategy with an archiving strategy with SmartLock WORM. |
Encryption | Encrypting both data in-flight and data at rest. |
Anti-virus | Integrating with anti-virus/anti-malware protection that does content inspection. |
Security advisories | Dell Security Advisories (DSA) inform customers about fixes to common vulnerabilities and exposures. |
- Data isolation provides a last resort copy of business critical data, and can be achieved by using an air gap to isolate the cyber vault copy of the data. The vault copy is logically separated from the production copy of the data. Data syncing happens only intermittently by closing the air gap after ensuring that there are no known issues.
The combination of OneFS snapshots and SyncIQ replication allows for granular data recovery. This means that only the affected files are recovered, while the most recent changes are preserved for the unaffected data. While an on-prem air-gapped cyber vault can still provide secure network isolation, in the event of an attack, the ability to failover to a fully operational ‘clean slate’ remote site provides additional security and peace of mind.
We’ll explore PowerScale cyber protection and recovery in more depth in a future article.
Author: Nick Trimbee
Related Blog Posts
OneFS and HTTP Security
Mon, 22 Apr 2024 20:35:30 -0000
|Read Time: 0 minutes
To enable granular HTTP security configuration, OneFS provides an option to disable nonessential HTTP components selectively. This can help reduce the overall attack surface of your infrastructure. Disabling a specific component’s service still allows other essential services on the cluster to continue to run unimpeded. In OneFS 9.4 and later, you can disable the following nonessential HTTP services:
Service | Description |
PowerScaleUI | The OneFS WebUI configuration interface. |
Platform-API-External | External access to the OneFS platform API endpoints. |
Rest Access to Namespace (RAN) | REST-ful access by HTTP to a cluster’s /ifs namespace. |
RemoteService | Remote Support and In-Product Activation. |
SWIFT (deprecated) | Deprecated object access to the cluster using the SWIFT protocol. This has been replaced by the S3 protocol in OneFS. |
You can enable or disable each of these services independently, using the CLI or platform API, if you have a user account with the ISI_PRIV_HTTP RBAC privilege.
You can use the isi http services CLI command set to view and modify the nonessential HTTP services:
# isi http services list ID Enabled ------------------------------ Platform-API-External Yes PowerScaleUI Yes RAN Yes RemoteService Yes SWIFT No ------------------------------ Total: 5
For example, you can easily disable remote HTTP access to the OneFS /ifs namespace as follows:
# isi http services modify RAN --enabled=0
You are about to modify the service RAN. Are you sure? (yes/[no]): yes
Similarly, you can also use the WebUI to view and edit a subset of the HTTP configuration settings, by navigating to Protocols > HTTP settings:
That said, the implications and impact of disabling each of the services is as follows:
Service | Disabling impacts |
WebUI | The WebUI is completely disabled, and access attempts (default TCP port 8080) are denied with the warning Service Unavailable. Please contact Administrator. If the WebUI is re-enabled, the external platform API service (Platform-API-External) is also started if it is not running. Note that disabling the WebUI does not affect the PlatformAPI service. |
Platform API | External API requests to the cluster are denied, and the WebUI is disabled, because it uses the Platform-API-External service. Note that the Platform-API-Internal service is not impacted if/when the Platform-API-External is disabled, and internal pAPI services continue to function as expected. If the Platform-API-External service is re-enabled, the WebUI will remain inactive until the PowerScaleUI service is also enabled. |
RAN | If RAN is disabled, the WebUI components for File System Explorer and File Browser are also automatically disabled. From the WebUI, attempts to access the OneFS file system explorer (File System > File System Explorer) fail with the warning message Browse is disabled as RAN service is not running. Contact your administrator to enable the service. This same warning also appears when attempting to access any other WebUI components that require directory selection. |
RemoteService | If RemoteService is disabled, the WebUI components for Remote Support and In-Product Activation are disabled. In the WebUI, going to Cluster Management > General Settings and selecting the Remote Support tab displays the message The service required for the feature is disabled. Contact your administrator to enable the service. In the WebUI, going to Cluster Management > Licensing and scrolling to the License Activation section displays the message The service required for the feature is disabled. Contact your administrator to enable the service. |
SWIFT | Deprecated object protocol and disabled by default. |
You can use the CLI command isi http settings view to display the OneFS HTTP configuration:
# isi http settings view Access Control: No Basic Authentication: No WebHDFS Ran HTTPS Port: 8443 Dav: No Enable Access Log: Yes HTTPS: No Integrated Authentication: No Server Root: /ifs Service: disabled Service Timeout: 8m20s Inactive Timeout: 15m Session Max Age: 4H Httpd Controlpath Redirect: No
Similarly, you can manage and change the HTTP configuration using the isi http settings modify CLI command.
For example, to reduce the maximum session age from four to two hours:
# isi http settings view | grep -i age Session Max Age: 4H # isi http settings modify --session-max-age=2H # isi http settings view | grep -i age Session Max Age: 2H
The full set of configuration options for isi http settings includes:
Option | Description |
--access-control <boolean> | Enable Access Control Authentication for the HTTP service. Access Control Authentication requires at least one type of authentication to be enabled. |
--basic-authentication <boolean> | Enable Basic Authentication for the HTTP service. |
--webhdfs-ran-https-port <integer> | Configure Data Services Port for the HTTP service. |
--revert-webhdfs-ran-https-port | Set value to system default for --webhdfs-ran-https-port. |
--dav <boolean> | Comply with Class 1 and 2 of the DAV specification (RFC 2518) for the HTTP service. All DAV clients must go through a single node. DAV compliance is NOT met if you go through SmartConnect, or using 2 or more node IPs. |
--enable-access-log <boolean> | Enable writing to a log when the HTTP server is accessed for the HTTP service. |
--https <boolean> | Enable the HTTPS transport protocol for the HTTP service. |
--https <boolean> | Enable the HTTPS transport protocol for the HTTP service. |
--integrated-authentication <boolean> | Enable Integrated Authentication for the HTTP service. |
--server-root <path> | Document root directory for the HTTP service. Must be within /ifs. |
--service (enabled | disabled | redirect | disabled_basicfile) | Enable/disable the HTTP Service or redirect to WebUI or disabled BasicFileAccess. |
--service-timeout <duration> | The amount of time (in seconds) that the server will wait for certain events before failing a request. A value of 0 indicates that the service timeout value is the Apache default. |
--revert-service-timeout | Set value to system default for --service-timeout. |
--inactive-timeout <duration> | Get the HTTP RequestReadTimeout directive from both the WebUI and the HTTP service. |
--revert-inactive-timeout | Set value to system default for --inactive-timeout. |
--session-max-age <duration> | Get the HTTP SessionMaxAge directive from both WebUI and HTTP service. |
--revert-session-max-age | Set value to system default for --session-max-age. |
--httpd-controlpath-redirect <boolean> | Enable or disable WebUI redirection to the HTTP service. |
Note that while the OneFS S3 service uses HTTP, it is considered a tier-1 protocol, and as such is managed using its own isi s3 CLI command set and corresponding WebUI area. For example, the following CLI command forces the cluster to only accept encrypted HTTPS/SSL traffic on TCP port 9999 (rather than the default TCP port 9021):
# isi s3 settings global modify --https-only 1 –https-port 9921 # isi s3 settings global view HTTP Port: 9020 HTTPS Port: 9999 HTTPS only: Yes S3 Service Enabled: Yes
Additionally, you can entirely disable the S3 service with the following CLI command:
# isi services s3 disable The service 's3' has been disabled.
Or from the WebUI, under Protocols > S3 > Global settings:
Author: Nick Trimbee
OneFS Key Manager Rekey Support
Mon, 24 Jul 2023 19:16:34 -0000
|Read Time: 0 minutes
The OneFS key manager is a backend service that orchestrates the storage of sensitive information for PowerScale clusters. To satisfy Dell’s Secure Infrastructure Ready requirements and other public and private sector security mandates, the manager provides the ability to replace, or rekey, cryptographic keys.
The quintessential consumer of OneFS key management is data-at-rest encryption (DARE). Protecting sensitive data stored on the cluster with cryptography ensures that it’s guarded against theft, in the event that drives or nodes are removed from a PowerScale cluster. DARE is a requirement for federal and industry regulations, ensuring data is encrypted when it is stored. OneFS has provided DARE solutions for many years through secure encrypted drives (SEDs) and the OneFS key management system.
A 256-bit key (MK) encrypts the Key Manager Database (KMDB) for SED and cluster domains. In OneFS 9.2 and later, the MK for SEDs can either be stored off-cluster on a KMIP server or locally on a node (the legacy behavior).
However, there are a variety of other consumers of the OneFS key manager, in addition to DARE. These include services and protocols such as:
Service | Description |
---|---|
CELOG | Cluster event log |
CloudPools | Cluster tier to cloud service |
Electronic mail | |
FTP | File transfer protocol |
IPMI | Intelligent platform management interface for remote cluster console access |
JWT | JSON web tokens |
NDMP | Network data management protocol for cluster backups and DR |
Pstore | Active directory and Kerberos password store |
S3 | S3 object protocol |
SyncIQ | Cluster replication service |
SmartSync | OneFS push and pull replication cluster and cloud replication service |
SNMP | Simple network monitoring protocol |
SRS | Old Dell support remote cluster connectivity |
SSO | Single sign-on |
SupportAssist | Remote cluster connectivity to Dell Support |
OneFS 9.5 introduces a number of enhancements to the venerable key manager, including:
- The ability to rekey keystores. Rekey operation will generate a new MK and re-encrypt all entries stored with the new key.
- New CLI commands and WebUI options to perform a rekey operation or schedule key rotation on a time interval.
- New commands to monitor the progress and status of a rekey operation.
As such, OneFS 9.5 now provides the ability to rekey the MK, irrespective of where it is stored.
Note that when you are upgrading from an earlier OneFS release, the new rekey functionality is only available once the OneFS 9.5 upgrade has been committed.
Under the hood, each provider store in the key manager consists of secure backend storage and an MK. Entries are kept in a SQLite database or key-value store. A provider datastore uses its MK to encrypt all its entries within the store.
During the rekey process, the old MK is only deleted after a successful re-encryption with the new MK. If for any reason the process fails, the old MK is available and remains as the current MK. The rekey daemon retries the rekey every 15 minutes if the process fails.
The OneFS rekey process is as follows:
- A new MK is generated, and internal configuration is updated.
- Any entries in the provider store are decrypted and encrypted with the new MK.
- If the prior steps are successful, the previous MK is deleted.
To support the rekey process, the MK in OneFS 9.5 now has an ID associated with it. All entries have a new field referencing the MK ID.
During the rekey operation, there are two MK values with different IDs, and all entries in the database will associate which key they are encrypted by.
In OneFS 9.5, the rekey configuration and management is split between the cluster keys and the SED keys:
Rekey component | Detail |
---|---|
SED |
|
Cluster |
|
SED keys rekey
The SED key manager rekey operation can be managed through a DARE cluster’s CLI or WebUI, and it can either be automatically scheduled or run manually on demand. The following CLI syntax can be used to manually initiate a rekey:
# isi keymanager sed rekey start
Alternatively, to schedule a rekey operation, for example, to schedule a key rotation every two months:
# isi keymanager sed rekey modify --key-rotation=2m
The key manager status for SEDs can be viewed as follows:
# isi keymanager sed status Node Status Location Remote Key ID Key Creation Date Error Info(if any) ----------------------------------------------------------------------------- 1 LOCAL Local 1970-01-01T00:00:00 ----------------------------------------------------------------------------- Total: 1
Alternatively, from the WebUI, go to Access > Key Management > SED/Cluster Rekey, select Automatic rekey for SED keys, and configure the rekey frequency:
Note that for SED rekey operations, if a migration from local cluster key management to a KMIP server is in progress, the rekey process will begin once the migration is complete.
Cluster keys rekey
As mentioned previously, OneFS 9.5 also supports the rekey of cluster keystore domains. This cluster rekey operation is available through the CLI and the WebUI and may either be scheduled or run on demand. The available cluster domains can be queried by running the following CLI syntax:
# isi keymanager cluster status Domain Status Key Creation Date Error Info(if any) ---------------------------------------------------------- CELOG ACTIVE 2023-04-06T09:19:16 CERTSTORE ACTIVE 2023-04-06T09:19:16 CLOUDPOOLS ACTIVE 2023-04-06T09:19:16 EMAIL ACTIVE 2023-04-06T09:19:16 FTP ACTIVE 2023-04-06T09:19:16 IPMI_MGMT IN_PROGRESS 2023-04-06T09:19:16 JWT ACTIVE 2023-04-06T09:19:16 LHOTSE ACTIVE 2023-04-06T09:19:11 NDMP ACTIVE 2023-04-06T09:19:16 NETWORK ACTIVE 2023-04-06T09:19:16 PSTORE ACTIVE 2023-04-06T09:19:16 RICE ACTIVE 2023-04-06T09:19:16 S3 ACTIVE 2023-04-06T09:19:16 SIQ ACTIVE 2023-04-06T09:19:16 SNMP ACTIVE 2023-04-06T09:19:16 SRS ACTIVE 2023-04-06T09:19:16 SSO ACTIVE 2023-04-06T09:19:16 ---------------------------------------------------------- Total: 17
The rekey process generates a new key and re-encrypts the entries for the domain. The old key is then deleted.
Performance-wise, the rekey process does consume cluster resources (CPU and disk) as a result of the re-encryption phase, which is fairly write-intensive. As such, a good practice is to perform rekey operations outside of core business hours or during scheduled cluster maintenance windows.
During the rekey process, the old MK is only deleted once a successful re-encryption with the new MK has been confirmed. In the event of a rekey process failure, the old MK is available and remains as the current MK.
A rekey may be requested immediately or may be scheduled with a cadence. The rekey operation is available through the CLI and the WebUI. In the WebUI, go to Access > Key Management > SED/Cluster Rekey.
To start a rekey of the cluster domains immediately, from the CLI run the following syntax:
# isi keymanager cluster rekey start Are you sure you want to rekey the master passphrase? (yes/[no]):yes
Alternatively, from the WebUI, go to Access under the SED/Cluster Rekey tab, and click Rekey Now next to Cluster keys:
A scheduled rekey of the cluster keys (excluding the SED keys) can be configured from the CLI with the following syntax:
# isi keymanager cluster rekey modify –-key-rotation [YMWDhms]
Specify the frequency of the Key Rotation field as an integer, using Y for years, M for months, W for weeks, D for days, h for hours, m for minutes, and s for seconds. For example, the following command will schedule the cluster rekey operation to run every six weeks:
# isi keymanager cluster rekey view Rekey Time: 1970-01-01T00:00:00 Key Rotation: Never # isi keymanager cluster rekey modify --key-rotation 6W # isi keymanager cluster rekey view Rekey Time: 2023-04-28T18:38:45 Key Rotation: 6W
The rekey configuration can be easily reverted back to on demand from a schedule as follows:
# isi keymanager cluster rekey modify --key-rotation Never # isi keymanager cluster rekey view Rekey Time: 2023-04-28T18:38:45 Key Rotation: Never
Alternatively, from the WebUI, under the SED/Cluster Rekey tab, select the Automatic rekey for Cluster keys checkbox and specify the rekey frequency. For example:
In an event of a rekeying failure, a CELOG KeyManagerRekeyFailed or KeyManagerSedsRekeyFailed event is created. Since SED rekey is a node-local operation, the KeyManagerSedsRekeyFailed event information will also include which node experienced the failure.
Additionally, current cluster rekey status can also be queried with the following CLI command:
# isi keymanager cluster status Domain Status Key Creation Date Error Info(if any) ---------------------------------------------------------- CELOG ACTIVE 2023-04-06T09:19:16 CERTSTORE ACTIVE 2023-04-06T09:19:16 CLOUDPOOLS ACTIVE 2023-04-06T09:19:16 EMAIL ACTIVE 2023-04-06T09:19:16 FTP ACTIVE 2023-04-06T09:19:16 IPMI_MGMT ACTIVE 2023-04-06T09:19:16 JWT ACTIVE 2023-04-06T09:19:16 LHOTSE ACTIVE 2023-04-06T09:19:11 NDMP ACTIVE 2023-04-06T09:19:16 NETWORK ACTIVE 2023-04-06T09:19:16 PSTORE ACTIVE 2023-04-06T09:19:16 RICE ACTIVE 2023-04-06T09:19:16 S3 ACTIVE 2023-04-06T09:19:16 SIQ ACTIVE 2023-04-06T09:19:16 SNMP ACTIVE 2023-04-06T09:19:16 SRS ACTIVE 2023-04-06T09:19:16 SSO ACTIVE 2023-04-06T09:19:16 ---------------------------------------------------------- Total: 17
Or, for SEDs rekey status:
# isi keymanager sed status Node Status Location Remote Key ID Key Creation Date Error Info(if any) ----------------------------------------------------------------------------- 1 LOCAL Local 1970-01-01T00:00:00 2 LOCAL Local 1970-01-01T00:00:00 3 LOCAL Local 1970-01-01T00:00:00 4 LOCAL Local 1970-01-01T00:00:00 ----------------------------------------------------------------------------- Total: 4
The rekey process also outputs to the /var/log/isi_km_d.log file, which is a useful source for additional troubleshooting.
If an error in rekey occurs, the previous MK is not deleted, so entries in the provider store can still be created and read as normal. The key manager daemon will retry the rekey operation in the background every 15 minutes until it succeeds.
Author: Nick Trimbee