Home Storage PowerScale (Isilon) Blogs

Will More Disks Lead to Better Performance in APEX File Storage in AWS?

Mon, 08 Jan 2024 18:02:59 -0000

Read Time: 0 minutes

Yunlong Zhang

Dell Technologies has developed a range of PowerScale platforms, including all flash models, hybrid models, and archive models, all of which exhibit exceptional design. The synergy between the disk system and the compute system is highly effective, showcasing a well-matched integration.

In the cloud environment, customers have the flexibility to control the number of CPU cores and memory sizes by selecting different instance types. APEX File Storage for AWS uses EBS volumes as its node disks. Customers can also select a different number of EBS volumes in each node, and for gp3 volumes, customers are able to customize the performance of each volume by specifying the throughput or IOPS capability.

With this level of flexibility, how shall we configure the disk system to make the most out of the entire OneFS system? Typically, in an on-prem appliance, the more disks a PowerScale node contains, the better performance the disk system can provide thanks to a greater number of devices contributing to the delivery of throughput or IOPS.

In a OneFS cloud environment, does it hold true that more EBS volumes indicates better performance? In short, it depends. When the aggregated EBS volume performance is smaller than the instance EBS bandwidth limit, test results show that more EBS volumes can improve performance. When aggregated EBS volume performance is larger than EBS bandwidth limit, adding more EBS volumes will not improve performance.

What is the best practice of setting the number of EBS volumes of each node?

1. Make the aggregated EBS volume bandwidth limit match the instance type EBS bandwidth limit.

For example, we want to use m5dn.16xlarge as the instance type of our OneFS cloud system. According to AWS, the EBS Bandwidth of m5dn.16xlarge is 13,600 Mbps, which is 1700 MB/sec. If we choose to use 10 EBS volumes in each node, then we should config each gp3 EBS volume to be capable of delivering 170 MB/sec throughput. This will make the aggregated EBS volume throughput equal to the m5dn.16xlarge EBS bandwidth limit.

Note that each gp3 EBS volume has 125MB/sec free throughput and 3,000 IOPS for free. As a cost-saving measure, we can config each node to have 12 EBS volumes to better leverage free EBS volume throughput.

For example, considering an m5dn.16xlarge instance type with 12 TB raw capacity per node, the disk cost of 10 volumes and 12 volumes are as follows:

For 10 drives, each EBS volume should support 170 MB/sec throughput, and each node EBS storage cost is 1001.2 USD a month.
For 12 drives, each EBS volume should support 142 MB/sec throughput, and each node EBS storage cost is 991.20 USD a month.

Using 12 EBS volumes can save $10 per node per month.

2. Do not set up more than 12 EBS volumes in each node.

Although APEX File Storage for AWS also supports 15, 18, and 20 gp3 volumes in each node, we do not recommend configuring more than 12 EBS volumes in each node for OneFS 9.7. This is best practice for keeping software journal space for each disk from being too small and is beneficial for write performance.

Author: Yunlong Zhang

Tags:

KPI Name	Description	Scope
Protocol Latency SMB	Average latency within last 10 minutes required for the various operations for the SMB protocol	Across all nodes and clients per cluster.
Protocol Latency NFS	Average latency within last 10 minutes required for the various operations for the NFS protocol.	Across all nodes and clients per cluster.
Active Clients NFS	The current number of active clients using NFS. The client is active when it is transmitting or receiving data.	Across all nodes per cluster.
Active Clients SMB 1	The current number of active clients using SMB 1. The client is active when it is transmitting or receiving data.	Across all nodes per cluster.
Active Clients SMB 2	The current number of active clients using SMB 2. The client is active when it is transmitting or receiving data.	Across all nodes per cluster.
Connected Clients NFS	The current number of connected clients using NFS. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.	Across all nodes per cluster.
Connected Clients SMB	The current number of connected clients using SMB. The client is connected when it has an open TCP connection to the cluster. It can transmit or receive data or it can be in an idle state.	Across all nodes per cluster.
Pending Disk Operation Count	The average pending disk operation count within the last 10 minutes. It is the number of I/O operations that are pending at the file system level and waiting to be issued to an individual drive.	Across all disks per cluster.
CPU Usage	The average usage of CPU cores including the physical cores and hyperthreaded core within last 10 minutes.	Across all nodes per cluster.
Cluster Capacity	The current used capacity for the cluster.	N/A
Nodepool Capacity	The current used capacity for the node pool in a cluster.	N/A
Drive Capacity	The current used capacity for a drive in a cluster.	N/A
Node Capacity	The current used capacity for a node in a cluster.	N/A
Network Throughput Equivalency	Checks whether the network throughput for each node within the last 10 minutes is within the specified threshold percentage of the average network throughput of all nodes in the node pool for the same time.	Across all nodes per node pool.

Your Browser is Out of Date

Will More Disks Lead to Better Performance in APEX File Storage in AWS?

Related Blog Posts

Unveiling APEX File Storage for AWS Enhancements

What's New?

Learn More

Alert in IIQ 5.0.0 – Part I

Introduction