Home > Storage > PowerScale (Isilon) > Product Documentation > Data Protection > Dell PowerScale SyncIQ: Architecture, Configuration, and Considerations > Optimizing SyncIQ performance
The recommended approach for measuring and optimizing performance is as follows:
For releases prior to OneFS 8.0, the number of primary and secondary workers is calculated between both clusters based on two factors. First, the lowest number of nodes between the two clusters is considered. The lowest number of nodes is then multiplied by the number of workers per node, which is a configurable value. The default value for workers per node is three. SyncIQ randomly distributes workers across the cluster with each node having at least one worker. If the number of workers is less than the number of nodes, then all nodes will not participate in the replication. The following figure shows a calculation example:
In OneFS 8.0, the limits have increased to provide additional scalability and capability in line with cluster sizes and higher-performing nodes that are available. The maximum number of workers and workers per policy both scale as the number of nodes in the cluster increase. The defaults should be changed only with the guidance of PowerScale Technical Support.
Note: The source and target cluster must have the same number of workers, as each set of source and target workers create a TCP session. Any inconsistency in the number of workers results in failed sessions. As stated above, the maximum number of target workers is 100 per node, implying the total number of source workers is also 100 per node.
Note: The following example is provided to show how a node’s CPU type affects worker count, how workers are distributed across policies, and how SyncIQ works on a higher level. The actual number of workers is calculated dynamically by OneFS based on the node type. The calculations in the example are not a tuning recommendation and are merely for illustration. If the worker counts require adjustment, contact PowerScale Technical Support, as the number of physical cores, nodes, and other factors are considered before making changes.
As an example, consider a 4-node cluster, with 4 cores per node. Therefore, there are 16 total cores in the cluster. Following the previous rules:
When the first policy starts, it will be assigned 32 workers (out of the maximum 64). A second policy starting will also be assigned 32 workers. The maximum number of workers per policy has been determined previously as 32, and there are now a total of 64 workers—the maximum for this cluster. When a third policy starts, assuming the first two policies are still running, the maximum of 64 workers are redistributed evenly, so that 21 workers are assigned to the third policy, and the first two policies have their number of workers reduced from 32 to 21 and 22 respectively, as 64 does not split into 3 evenly. Therefore, there are 3 policies running, each with 21 or 22 workers, keeping the cluster maximum number of workers at 64. Similarly, a fourth policy starting would result in all four policies having 16 workers. When one of the policies is complete, the reallocation again ensures that the workers are distributed evenly among the remaining running policies.
Note: Any reallocation of workers on a policy occurs gradually to reduce thrashing when policies are starting and stopping frequently.
Administrators may want to specify a limit for the number of concurrent SyncIQ jobs running. Limiting the number is particularly useful during peak cluster usage and client activity. Forcing a limit on cluster resources for SyncIQ ensures that clients do not experience any performance degradation.
Note: Consider all factors prior to limiting the number of concurrent SyncIQ jobs, as policies may take more time to complete, affecting RPO and RTO times. As with any significant cluster update, testing in a lab environment is recommended before a production cluster is updated. Also, a production cluster should be updated gradually to minimize impact and allow measurements of the impacts.
To limit the maximum number of concurrent SyncIQ jobs, perform the following steps from the OneFS CLI:
OneFS 8.0 introduced an updated SyncIQ algorithm taking advantage of all available cluster resources, improving overall job run times significantly. SyncIQ is exceptionally efficient in network data scaling and uses 2 MB TCP windows, considering WAN latency while delivering maximum performance.
Note: The steps and processes mentioned in this section may significantly affect RPO times and client workflow. Prior to updating a production cluster, test all updates in a lab environment that mimics the production environment. Only after successful lab trials, should the production cluster be considered for an update. As a best practice, gradually implement changes and closely monitor the production cluster after any significant updates.
SyncIQ achieves maximum performance by using all available cluster resources. SyncIQ consumes the following resources if they are available:
As SyncIQ consumes cluster resources, this may affect current workflows depending on the environment and available resources. If data replication is affecting other workflows, consider tuning SyncIQ as a baseline as follows:
For information about updating the variables above, see SyncIQ performance rules. After the baseline is configured, gradually increase each parameter and collect measurements to ensure that workflows are not affected. Also consider modifying the maximum number of SyncIQ jobs, as described in Specifying a maximum number of concurrent SyncIQ jobs.
Note: The baseline variables provided here are only for guidance and are not a one-size-fits-all metric. Every environment is different. Carefully consider cluster resources and workflow when the intersection of workflow affects SyncIQ performance.