With OneFS 9.3 and later, you can exclude one or more nodes from participating in running a job. This ability allows the temporary exclusion of any nodes with high load, or other issues, so that jobs do not become stuck. Configuration is through the OneFS CLI and gconfig, applies to all jobs on startup, and can include the Job Engine coordinator among the excluded nodes. However, the exclusion configuration is not dynamic, and once a job is started with the final node set, further reconfiguration is not permitted.
The CLI syntax for configuring an excluded nodes list on a cluster is as follows (in this example, excluding nodes one through three):
# isi_gconfig –t job-config core.excluded_participants="{1,2,3}"
The excluded_participants are entered as a comma-separated devid value list with no spaces, specified within parentheses and double quotes. All excluded nodes must be specified in full because there is no aggregation. This configuration can be easily reset to avoid excluding any nodes by assigning the {} value.
# isi_gconfig –t job-config core.excluded_participants="{}"
A core.excluded_participant_percent_warn parameter defines the maximum percentage of removed nodes.
# isi_gconfig -t job-config core.excluded_participant_percent_warn
core.excluded_participant_percent_warn (uint) = 10
This parameter defaults to 10 percent, above which a CELOG event warning is generated. CELOG events also provide reminders to remove any exclusions when they are no longer required.