Before getting started with the firmware and driver updates, contact either Dell Technologies Support or Microsoft Support to unlock or “break the glass” within the PEP session. Doing so allows the ability to monitor the update process in real time while each scale unit node is being placed into Maintenance mode. The basic process for unlocking the PEP session is shown below:
invoke-command -session $session -scriptblock { get-supportsessiontoken }
invoke-command -session $session -scriptblock { unlock-supportsession }
Figure 12. Unlocking the PEP session
Note: It is recommended that an Azure Stack Hub Operator continue to have Dell Technologies Support or Microsoft Support monitor the progress after “breaking the glass.”
It is important to take a valid inventory of the environment to record the “before” and “after” results of the scale unit node’s firmware update process.
Figure 13. Load Compliance Tools
“Invoke-CheckFirmwareBaseline –IPAddresses <single IP address or comma separated list of scale unit node iDRAC IP addresses> –iDRACCredential (Get-Credential) -oemExtensionPath “DUPS” –ManagementServerAddress < OME VM OS IP > –ManagementServerCredential (Get-Credential)”
Note: All characters contained with “< >” are for illustration purposes only. You will need to replace <xxxx> with those for your environment.
The following screen shot is a sample output. Items in red are the Current Version and the Available Version. If the Available Version is greater than the Current Version, those versions will need to be remediated. Pay close attention to the -IPAddresses switch, as it will change to match each scale unit node as they are updated.
Figure 14. Sample output – Current and Available versions
The firmware updates should be applied in a round-robin or N+1 fashion, starting with scale unit node #1.
Note: Prior to running any firmware updates on a scale unit node, the node must be placed into Maintenance mode. This is initiated using options present in the Azure Stack Hub Administration Portal and normally takes between 10-15 minutes for the process to complete.
Note: If the stack is under high utilization, the drain process may take longer to complete.
Note: It is recommended that before attempting to drain any node, first ensure that there is enough free space available on the tenant shares. This information is available through the Administration Portal under the Storage pane. This is especially important if there are any large database servers running on the scale unit. A rule of thumb is typically to have at least 20% of free space on any volume. This amount may vary. If unsure, contact Dell Technologies Support or Microsoft Support.
Figure 15. Draining a node
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).State }
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).StatusInformation }
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).DrainStatus }
Figure 16. Monitoring the drain status
Note: It is assumed that these updates are being applied in the production environment. Also, some update cycles will require multiple reboots. Once the scale unit node is in Maintenance mode, the required files will be loaded to each iDRAC individually, so the updates may proceed one node at a time. When one node completes, the Azure Stack Hub Operator will Resume that node, wait for the cluster to re-balance that node, then Drain the next node for maintenance until all nodes have been completed. The firmware updates take about 45 minutes to run on each node and on average the re-balance process take from 60-120 minutes per node based on the activity and load of the cluster.
Figure 17. Shutting down a node
Figure 18. Job Queue screen – 13G
The following screen shot is a sample of the Job Queue on a 14G server.
Figure 19. Job Queue screen – 14G
For 13G systems, navigate to iDRAC Settings, and then to Update and Rollback.
Figure 20. Firmware Update screen – 13G
For 14G systems, select Maintenance, and then System Update.
Figure 21. System Update tab
Note: Make sure to upload the updates in the order they were listed in the Catalog.xml file.
The following list of firmware and the screen shot show updates to be applied to a 13G scale unit node.
Once loaded, the updates are automatically prioritized by criticality.
Figure 22. Firmware Update screen
Figure 23. Job Queue screen
Figure 24. iDRAC virtual console
Figure 25. Job Queue
Note: When all the jobs have been reported as completed, the jobs may be selected and deleted from the queue.
“Invoke-CheckFirmwareBaseline –IPAddresses <single IP address of current scale unit node’s iDRAC> –iDRACCredential (Get-Credential) -oemExtensionPath “DUPS” –ManagementServerAddress < OME VM OS IP > –ManagementServerCredential (Get-Credential)”
Note: When running the Invoke-CheckFirmwareBaseline command at this step in the update process, it is important that you DO NOT USE the “-remediate” parameter.
Figure 26. Verifying changes
“Invoke-CheckBIOSSettings –IPAddresses <single IP address of current scale unit node’s iDRAC> –iDRACCredential (Get-Credential)”
Figure 27. Validating BIOS settings
Note: If any of the BIOS settings come back with required changes to be applied, simply run the same command with the “-remediate” parameter. If required, it will automatically reboot the node.
Note: The scripts will error with a message if the node being worked on is not in a powered-down state and the “-remediate” parameter is used.
“Invoke-CheckIDRACSettings –IPAddresses <single IP address of current scale unit node’s iDRAC> –iDRACCredential (Get-Credential)”
Figure 28. Invoke-CheckIDRACSettings
Note: The iDRAC settings function automatically validates and remediates since a change to the iDRAC settings does not reboot the server.
Figure 29. Start/Resume
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).State }
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).StatusInformation }
invoke-command -session $session -scriptblock { (get-clusternode -cluster s-cluster -name sac02-s1-n01).DrainStatus }
Figure 30. Monitoring the status
“invoke-command -session $session -scriptblock { get-storagejob }”
“invoke-command -session $session -scriptblock { get-virtualdisk -cimsession s-cluster | Get-StorageJob }”
Figure 31. Monitoring the cluster storage jobs
Note: When all the Storage Jobs have completed, and the cluster has been rebalanced, you should wait an additional 10-15 minutes before initiating a Drain on the next node.
Note: It is also necessary to ensure that the nodes’ core count and memory are correctly appearing in the Administration Portal before moving forward. Sometimes, those columns will show a ‘0’ for the core count or a ‘-‘ for the memory on the node that has been resumed. Wait until they appear correctly before moving forward.
Figure 32. Node cores and memory
Repeat all the steps for each scale unit node until each node in the cluster has been updated and validated as required.
Note: When all the Scale Unit Nodes have been updated, exit the unlocked or “Broken Glass” PEP session.