After a policy is configured, the first time it runs, an Initial Replication is performed. During the policy configuration, a user can configure a synchronization or copy policy.
The synchronization policy ensures that the target cluster has a precise duplicate of the source directory. As the source directory is modified through additions and deletions, those updates are propagated to the target cluster when the policy runs next. Under Disaster Recovery use cases, the synchronization policy supports a failover to the target cluster, allowing users to continue with access to the same dataset as the source directory.
On the contrary, a copy policy is targeted for archive and backup use cases. A copy policy maintains current versions of files stored on the source cluster.
The first segment of the Initial Replication is the job start. A scheduler process is responsible for starting a data replication job. It determines the start time based on either the scheduled time or a manually started job. When the time arrives, the scheduler updates the policy to a pending status on the source record and creates a directory with information specific to the job.
After the creation of the initial directory with the SyncIQ policy ID, a scheduler process of a node takes control of the job. After a node’s scheduler process has taken control of the job, the directory is renamed again to reflect the node’s device ID. Next, one of the scheduler processes create the coordinator process and the directory structure is renamed again.
After the directory structure is renamed to reflect the SyncIQ policy ID, node ID, and coordinator PID, the data transfer stage commences. The coordinator has a primary worker process start a treewalk of the current SyncIQ snapshot. This snapshot is named snapshot-<SyncIQ Policy ID>-new. On the target cluster, the secondary workers receive the treewalk information, mapping out the LINs accordingly.
During the treewalk and exchange of LIN information, a list of target node IP addresses is gathered through the target monitor process. At this point, the primary workers setup TCP connections with the secondary workers of target nodes for the remainder of the job. If a worker on a cluster crashes, the corresponding worker will also. In this event, the coordinator launches a new primary worker process and establishes a new TCP connection with a secondary worker. If the coordinator crashes, the scheduler restarts the coordinator, and all workers must establish TCP connections again. The number of workers is calculated based on many factors. See Worker and performance scalability for more information about calculating workers.
Now that the primary and secondary workers are created with TCP connections between each, data transfer is started between each set of workers.
As each set of workers completes data transfer, they go into an idle state. After all workers are in an idle state, and the restart queue does not contain any work items, this indicates the data replication is complete. At this point, the coordinator renames the snapshot taken at the onset to snapshot-<SyncIQ Policy ID>-latest. Next, the coordinator files a job report. If the SyncIQ policy is configured to create a target-side snapshot, that is taken at this time. Finally, the coordinator removes the job directory that was created at the onset and the job is complete.