Preparations for Upgrading a CloudPools Environment
Thu, 23 Jun 2022 15:51:46 -0000
|Read Time: 0 minutes
Introduction
CloudPools 2.0 brings many improvements and is released along with OneFS 8.2.0. It’s valuable to be able to upgrade OneFS from 8.x to 8.2.x or later and leverage the data management benefits of CloudPools 2.0.
This blog describes the preparations for upgrading a CloudPools environment. The purpose is to avoid potential issues when upgrading OneFS from 8.x to 8.2.x or later (that is, from CloudPools 1.0 to CloudPools 2.0).
For the recommended procedure for upgrading a CloudPools environment, see the document PowerScale CloudPools: Upgrading 8.x to 8.2.2.x or later.
For the best practices and considerations for CloudPools upgrades, see the white paper Dell PowerScale: CloudPools and ECS.
This blog covers the preparations both on cloud providers and on PowerScale clusters.
Cloud providers
CloudPools is a OneFS feature that allows customers to archive or tier data from a PowerScale cluster to cloud storage, including public cloud providers such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud, Alibaba Cloud, or a private cloud based on Dell ECS.
Important: Run the isi cloud account list command to verify which cloud providers are used for CloudPools. Different authentications are used with different cloud providers for CloudPools, which might cause potential issues when upgrading a CloudPools environment.
AWS signature authentication is used for AWS, Dell ECS, and Google Cloud. In OneFS releases prior to 8.2, AWS SigV2 is only supported for CloudPools. Starting from OneFS 8.2, AWS SigV4 is added, which provides an extra level of security for authentication with the enhanced algorithm. For more information about V4, see Authenticating Requests: AWS Signature V4. AWS SigV4 will be used automatically for CloudPools in OneFS 8.2.x or later if the configurations (CloudPools and cloud providers) are correct. Please note that a different authentication is used for Azure or Alibaba Cloud.
If public cloud providers are used in a customer’s environment, there should be no issues because all configurations are already created by public cloud providers.
If Dell ECS is used in a customer’s environment, the ECS configurations are implemented separately and you need make sure that the configurations are correct on ECS, including load balancer and Domain Name System (DNS).
This section only covers the preparations for CloudPools and Dell ECS before upgrading OneFS from 8.x to 8.2.x or later.
Dell ECS
In general, CloudPools may already be archiving a lot of data from a PowerScale (Isilon) cluster to ECS before an upgrade OneFS from 8.x to 8.2.x or later. That means that most of the configurations should be created for CloudPools. For more information about CloudPools and ECS, see the white paper Dell PowerScale: CloudPools and ECS.
This section covers the following configurations for ECS before a OneFS upgrade from 8.x to 8.2.x or later.
- Load balancer
- DNS
- Base URL
Load balancer
A load balancer balances traffic to the various ECS nodes from the PowerScale cluster, and can provide much better performance and throughput for CloudPools. A load balancer is strongly recommended for CloudPools 2.0 and ECS. The following white papers provide information about how to implement a load balancer with ECS:
DNS
AWS always has a wildcard DNS record configured. See the document Virtual hosting of buckets, which introduces path-style access and virtual hosted-style access for a bucket. It also shows how to associate a hostname with an Amazon S3 bucket using CNAMEs for virtual hosted-style access.
Meanwhile, the path-style URL will be deprecated on September 23, 2022. Buckets created after that date must be referenced using the virtual-hosted model. For the reasons behind moving to the virtual-hosted model, see the document Amazon S3 Path Deprecation Plan – The Rest of the Story.
ECS supports Amazon S3 compatible applications that use virtual host-style and path-style addressing schemes. (For more information, see document Bucket and namespace addressing.) And, to help ensure the proper DNS configuration for ECS, see the document DNS configuration.
The procedure for configuring DNS depends on your DNS server or DNS provider.
For example, a DNS is set up on a Windows server. The following two tables and three figures show the DNS entries created. The customer must create their own DNS entries.
Name | Record Type | FQDN | IP Address | Comment |
ecs | A | ecs.demo.local | 192.168.1.40 | The FQDN of the load balancer will be ecs.demo.local. |
Name | Record Type | FQDN | FQDN for | Comment |
cloudpools_uri | CNAME | cloudpools_uri.demo.local | ecs.demo.local | If you create an SSL certificate for the ECS S3 service, it must have the certificate and the non-wildcard version as a Subject Alternate Name. |
*.cloudpools_uri | CNAME | *.cloudpools_uri.demo.local | ecs.demo.local | Used for virtual host addressing for a bucket. |
Base URL
In CloudPools 2.0 and ECS, a base URL must be created on ECS. For details about creating a Base URL on ECS, see the section Appendix A Base URL in the white paper Dell PowerScale: CloudPools and ECS.
When creating a new Base URL, keep the default setting (No) for Use with Namespace. Make sure that the Base URL is the FQDN alias of the load balancer virtual IP.
PowerScale clusters
If SyncIQ is configured for CloudPools, run the following commands on the source and target PowerScale cluster to check and record the CloudPools configurations, including CloudPools storage accounts, CloudPool, file pool policies, and SyncIQ policies.
# isi cloud accounts list -v # isi cloud pools list -v # isi filepool policies list -v # isi sync policies list -v
For CloudPools and ECS, please be sure that URI is the FQDN alias of the load balancer virtual IP.
Important: It is strongly recommended that no job (such as for CloudPools/SmartPools, SyncIQ, and NDMP) be running before upgrading.
In a SyncIQ environment, upgrade the SyncIQ target cluster before upgrading the source cluster. OneFS allows SyncIQ to send CP1.0 formatted SmartLink files to the target, where they will be converted into CP2.0 formatted SmartLink files. (If the source cluster is upgraded first, Sync operations will fail until both are upgraded; the only known resolution is to reconfigure the Sync policy to "Deep Copy".)
And the customer may have active (read & write) CloudPools accounts both on source and target PowerScale clusters, replicating SmartLink files of active CloudPools accounts bidirectionally. That means that the source is also a target. In this case, you need to reconfigure the Sync policy to “Deep Copy” on one of PowerScale clusters. After that, the target with replicated SmartLink files should be upgraded first.
Summary
This blog covered what you need to check, on cloud providers and PowerScale clusters, before upgrading OneFS from 8.x to 8.2.x or later (that is, from CloudPools 1.0 to CloudPools 2.0). My hope is that it can help you avoid potential CloudPools issues when upgrading a CloudPools environment.
Author: Jason He, Principal Engineering Technologist
Related Blog Posts
CloudPools Operation Workflows
Fri, 12 Jan 2024 21:01:01 -0000
|Read Time: 0 minutes
The Dell PowerScale CloudPools feature of OneFS allows tiering cold or infrequently accessed data to move to lower-cost cloud storage. CloudPools extends the PowerScale namespace to the private cloud, or the public cloud. For CloudPools supported cloud providers, see the CloudPools Supported Cloud Providers blog.
This blog focuses on the following CloudPools operation workflows:
- Archive
- Recall
- Read
- Update
Archive
The archive operation is the CloudPools process of moving file data from the local PowerScale cluster to cloud storage. Files are archived either using the SmartPools Job or from the command line. The CloudPools archive process can be paused or resumed.
The following figure shows the workflow of the CloudPools archive.
Figure 1. Archive workflow
More workflow details include:
- The file pool policy in Step 1 specifies a cloud target and cloud-specific parameters. Policy examples include:
- Encryption: CloudPools provides an option to encrypt data before the data is sent to the cloud storage. It uses the PowerScale key management module for data encryption and uses AES-256 as the encryption algorithm. The benefit of encryption is that only encrypted data is being sent over the network.
- Compression: CloudPools provides an option to compress data before the data is sent to the cloud storage. It implements block-level compression using the zlib compression library. CloudPools does not compress data that is already compressed.
- Local data cache: Caching is used to support local reading and writing of SmartLink files. To optimize performance, it reduces bandwidth costs by eliminating repeated fetching of file data for repeated reads and writes. The data cache is used for temporarily caching file data from the cloud storage on PowerScale disk storage for files that have been moved off cluster by CloudPools.
- Data retention: Data retention is a concept used to determine how long to keep cloud objects on the cloud storage.
- When chunks are sent from the PowerScale cluster to cloud in Step 3, a checksum is applied for each chunk to ensure data integrity.
Recall
The recall operation is the CloudPools process of reversing the archive process. It replaces the SmartLink file by restoring the original file data on the PowerScale cluster and removing the cloud objects in cloud. The recall process can only be performed using the command line. The CloudPools recall process can be paused or resumed.
The following figure shows the workflow of CloudPools recall.
Figure 2. Recall workflow
Read
The read operation is the CloudPools process of client data access, known as inline access. When a client opens a file for read, the blocks are added to the cache in the associated SmartLink file by default. The cache can be disabled by setting the accessibility in the file pool policy for CloudPools. The accessibility setting is used to specify how data is cached in SmartLink files when a user or application accesses a SmartLink file on the PowerScale cluster. Values are cached (default) and no cache.
The following figure shows the workflow of CloudPools read by default.
Figure 3. Read workflow
Starting from OneFS 9.1.0.0, cloud object cache is introduced to enhance CloudPools functions for communicating with cloud. In Step 1, OneFS looks for data in the object cache first and OneFS retrieves data from the object cache if the data is already in the object cache. Cloud object cache reduces the number of requests to cloud when reading a file.
Prior to OneFS 9.1.0.0, in Step 1, OneFS looks for data in the local data cache first. It moves to Step 3 if the data is already in the local data cache.
Note: Cloud object cache is per node. Each node maintains its own object cache on the cluster.
Update
The update operation is the CloudPools process that occurs when clients update data. When clients change to a SmartLink file, CloudPools first writes the changes in the data local cache and then periodically sends the updated file data to cloud. The space used by the cache is temporary and configurable.
The following figure shows the workflow of CloudPools update.
Figure 4. Update workflow
Thank you for taking the time to read this blog, and congratulations on gaining a clear understanding of how the OneFS CloudPools operation works!
Author: Jason He, Principal Engineering Technologist
CloudPools Reporting
Fri, 12 Jan 2024 20:33:21 -0000
|Read Time: 0 minutes
This blog focuses on CloudPools reporting, specifically:
- CloudPools network stats
- The isi_fsa_pools_usage feature
CloudPools network stats
Dell PowerScale CloudPools network stats collect every network transaction and provide network activity statistics from CloudPools connections to the cloud storage.
Displaying network activity statistics
The network activity statistics include bytes In, bytes Out, and the number of GET, PUT, and DELETE operations. CloudPools network stats are available in two categories:
- Per CloudPools account
- Per file pool policy
Note: CloudPools network stats do not provide file statistics, such as the file list being archived or recalled.
Run the following command to check the CloudPools network stats by CloudPools account:
isi_test_cpool_stats -Q --accounts <account_name>
For example, the following command shows the current CloudPools network stats by CloudPools account:
isi_test_cpool_stats -Q --accounts testaccount Account Name Bytes In Bytes Out Num Reads Num Writes Num Deletes testaccount 4194896000 4194905034 4000 2001 8001
Similarly, you can run the following command to check the CloudPools network stats by file pool policy:
isi_test_cpool_stats -Q --policies <policy_name>
And here is an example of current CloudPools network stats by file pool policy:
isi_test_cpool_stats -Q --policies testpolicy Policy Name Bytes In Bytes Out Num Reads Num Writes testpolicy 4154896000 4154905034 4000 2001
Note: The command output does not include the number of deletes by file pool policy.
Run the following command to check the history for CloudPools network stats:
isi_test_cpool_stats -q –s <number of seconds in the past to start stat query>
Use the s parameter to define the number of seconds in the past. For example, set it as 86,400 to query CloudPools network stats over the last day, as in the following example:
isi_test_cpool_stats -q -s 86400 Account bytes-in bytes-out gets puts deletes testaccount | 4194896000 | 4194905034 | 4000 | 2001 | 8001
You can also run the following command to flush stats from memory to database and get the real-time CloudPools network stats:
isi_test_cpool_stats -f
Displaying stats for CloudPools activities
The cloud statistics namespace with CloudPools is added in OneFS 9.4.0.0. This feature leverages existing OneFS daemons and systems to track statistics about CloudPools activities. The statistics include bytes In, bytes Out, and the number of Reads, Writes, and Deletions. CloudPools statistics are available in two categories:
- Per CloudPools account
- Per file pool policy
Note: The cloud statistics namespace with CloudPools does not provide file statistics, such as the file list being archived or recalled.
You can run these isi statistics cloud commands to view statistics about CloudPools activities:
isi statistics cloud --account <account_name> isi statistics cloud --policy <policy_name>
The following command shows an example of current CloudPools statistics by CloudPools account:
isi statistics cloud --account s3 Account Policy In Out Reads Writes Deletions Cloud Node s3 218.5KB 218.7KB 1 2 0 AWS 3 s3 0.0B 0.0B 0 0 0 AWS 1 s3 0.0B 0.0B 0 0 0 AWS 2
The following command shows an example of current CloudPools statistics by file pool policy:
isi statistics cloud --policy s3policy Account Policy In Out Reads Writes Deletions Cloud Node s3 s3policy 218.5KB 218.7KB 1 2 0 AWS 3 s3 s3policy 0.0B 0.0B 0 0 0 AWS 1 s3 s3policy 0.0B 0.0B 0 0 0 AWS 2
The isi_fsa_pools_usage feature
Starting from OneFS 8.2.2, you can run the following command to list Logical Size and Physical Size of stubs in one directory. This feature leverages IndexUpdate and FSA (File System Analytics) jobs. To enable this feature, it requires:
- Scheduling the IndexUpdate job. It’s recommended to run it every four hours.
- Scheduling the FSA job. It’s recommended to run it every day, but not more often than the IndexUpdate job.
isi_fsa_pools_usage /ifs Node Pool Dirs Files Streams Logical Size Physical Size Cloud 0 1 0 338.91k 24.00k h500_30tb_3.2tb-ssd_128gb 42 300671 0 879.23G 1.20T
Now, you get how to use commands for CloudPools reporting. It’s simple and straightforward. Thanks for reading!
Author: Jason He, Principal Engineering Technologist