Kubernetes Node Non-Graceful Shutdown and Remediation: Insights from Dell Technologies
Tue, 12 Dec 2023 18:16:57 -0000
|Read Time: 0 minutes
Introduction
Kubernetes has become a pivotal technology in managing containerized applications, but it's not without its challenges, particularly when dealing with Stateful Apps and non-graceful shutdown scenarios. This article delves into the intricacies of handling such situations, drawing insights from Dell Technologies' expertise and more importantly, how to enable it.
Understanding Graceful vs. Non-Graceful Node Shutdowns in Kubernetes
A 'graceful' node shutdown in Kubernetes is an orchestrated process. When kubelet detects a node shutdown event, it terminates the pods on that node properly, releasing resources before the actual shutdown. This orderly process allows critical pods to be terminated after regular pods, ensuring an application continues operating as long as possible. This process is vital for maintaining high availability and resilience in applications.
However, issues arise with a non-graceful shutdown, like a hard stop or node crash. In such cases, kubelet fails to detect a clean shutdown event. This leads to Kubernetes marking the node ‘NotReady', and Pods in a Stateful Set can remain stuck in 'Terminating' mode indefinitely!
Kubernetes adopts a cautious approach in these scenarios since it cannot ascertain if the issue is a total node failure, a kubelet problem, or a network glitch. This distinction is critical, especially for stateful apps, where rescheduling amidst active data writing could lead to severe data corruption.
Role of Dell's Container Storage Module (CSM) for Resiliency
Dell's CSM for Resiliency plays a crucial role in automating decision-making in these complex scenarios, aiming to minimize manual intervention and maximize uptime. The module's functionality is highlighted through a typical workflow:
- Consider a pod with two mounted volumes, annotated for protection with CSM resiliency.
- Upon an abrupt node power-off, the Kubernetes API detects the failure, marking the node as 'Not Ready'.
- The podmon controller of CSM Resiliency then interrogates the storage array, querying its status regarding the node and volumes.
- Depending on its findings and a set heuristic, the module determines whether it's safe to reschedule the pod.
- If rescheduling is deemed safe, the module quickly fences off access for the failed node, removes the volume attachment, and force-deletes the pod, enabling Kubernetes to reschedule it efficiently.
The following tutorial allow to test the functionality live: https://dell.github.io/csm-docs/docs/interactive-tutorials/
How to enable the module ?
To take advantage of the CSM resiliency you need two things:
- Enable it for your driver, for example with PowerFlex
- With the CSM wizard, just check the resiliency box
- With the Operator just set enable: true in the section .spec.modules.name['resiliency']
- With the helm chart set enable: true in the section .csi-vxflexos.podmon
- Then protect you application by adding the magic label podmon.dellemc.com/driver: csi-vxflexos
Conclusion:
Managing non-graceful shutdowns in Kubernetes, particularly for stateful applications, is a complex but essential aspect of ensuring system resilience and data integrity.
Tools like Dell's CSM for Resiliency are instrumental in navigating these challenges, offering automated, intelligent solutions that keep applications running smoothly even in the face of unexpected failures.
Sources
Stay informed of the latest updates of Dell CSM eco-system by subscribing to:
* The Dell CSM Github repository
* Our DevOps & Automation Youtube playlist
* The Slack
Related Blog Posts
Kubernetes on Z with PowerMax: Modern Software Running on Mainframe
Mon, 02 Oct 2023 13:21:45 -0000
|Read Time: 0 minutes
Benefits of Kubernetes on System Z and LinuxOne
When I was a customer, I consistently evaluated how to grow the technical influence of the mainframe platform. If I were talking about the financials of the platform, I would evaluate the total cost of ownership (TCO) alongside various IT solutions and the value deduced thereof. If discussing existing technical pain points, I would evaluate technical solutions that may alleviate the issue.
For example, when challenged with finding a solution for a client organization aiming to refresh various x86 servers, I searched online presentations, YouTube videos, and technical websites for a spark. The client organization had already identified the pain point. The hard part was how.
Over time, I found the ability to run Linux on a mainframe (called Linux on Z), using an Integrated Facility for Linux (IFL) engine. Once the idea was formed, I started baking the cake. I created a proof-of-concept environment installing Linux and a couple of applications and began testing.
The light-bulb moment came not in resolving the original pain point, but in discovering new opportunities I had not originally thought of. More specifically:
- Physical server consolidation – I’ll create a plethora of virtual servers when needed
- License Consolidation – Certain applications with x86 were licensed on a per engine basis. A quad core x86 server may need four application licenses to function. I needed one license for my Linux on Z environment (at the time of testing)
- Scalability – I could scale horizontally by adding more virtual machines and vertically by increasing the network ports accessible to the server and adding more memory/storage
- Reliability – Mainframe technology has been known to be reliable, utilizing fault tolerant mechanisms within the software and hardware to continue business operations
With the 2023 addition of Kubernetes on LinuxOne (mainframe that only runs Linux), you can scale, reduce TCO, and build that hybrid cloud your IT management requires. With Kubernetes providing container orchestration irrelevant of the underlying hardware and architecture, you can leverage the benefits of LinuxOne to deploy your applications in a structured fashion.
Benefits when deploying Kubernetes to Linux on Z may include:
- Enablement of DevOps processes
- Container Scalability – using one LinuxOne box with hundreds (if not thousands) of containers
- Hybrid Cloud Strategy – where LinuxOne is servicing various internal business organizations with their compute and storage needs
With Dell providing storage to mainframe environments with PowerMax 8500/2500, a Container Storage Interface (CSI) was created to simplify your experience with allocating storage to Kubernetes environments when using Linux on Z with Kubernetes.
The remaining content will focus on the CSI for PowerMax. Continue reading to explore what’s possible.
Deploy Kubernetes
Linux on IBM Z runs on s390x architecture. This means that all the software we use needs to be compiled with that architecture in mind.
Luckily, Kubernetes, CSI sidecars, and Dell CSI drivers are built in Golang. Since the early days of Go, the portability and support of different OS and architectures has been one of the goals of the project. You can get the list of compatible OS and architecture with your go version using the command:
go tool dist list
The easiest and most straightforward way of trying Kubernetes on LinuxOne is by using the k3s distro. It installs with the following one-liner:
curl -sfL https://get.k3s.io | sh -
Build Dell CSI driver
The Dell CSI Driver for PowerMax is composed of a container to run all actions against Unisphere and mount a LUN to a pod, with a set of official CSI sidecars to interact with Kubernetes calls.
The Kubernetes official sidecars are published for multiple architectures including s390x while Dell publishes only images for x86_64.
To build the driver, we will first build the binary and then the image.
Binary
First, let’s clone the driver from https://github.com/dell/csi-powermax in your GOPATH. To build the driver, go in the directory and just execute:
CGO_ENABLED=0 GOOS=linux GOARCH=s390x GO111MODULE=on go build
At the end of the build, you must have a single binary with static libs compiled for the s390x:
file csi-powermax
csi-powermax: ELF 64-bit MSB executable, IBM S/390, version 1 (SYSV), statically linked, Go BuildID=…, with debug_info, not stripped
Container
The distributed driver uses minimal Red Hat Universal Base Image. There is no s390x compatible UBI image. Therefore, we need to rebuild the container image from a Fedora base-image.
The following is the Dockerfile:
# Dockerfile to build PowerMax CSI Driver
FROM docker.io/fedora:37
# dependencies, following by cleaning the cache
RUN yum install -y \
util-linux \
e2fsprogs \
which \
xfsprogs \
device-mapper-multipath \
&& \
yum clean all \
&& \
rm -rf /var/cache/run
# validate some cli utilities are found
RUN which mkfs.ext4
RUN which mkfs.xfs
COPY "csi-powermax" .
COPY "csi-powermax.sh" .
ENTRYPOINT ["/csi-powermax.sh"]
We can now build our container image with the help of docker buildx, which makes building cross-architecture a breeze:
docker buildx build -o type=registry -t coulof/csi-powermax:v2.8.0 --platform=linux/s390x -f Dockerfile.s390x .
The last step is to change the image in the helm chart to point to the new one: https://github.com/dell/helm-charts/blob/main/charts/csi-powermax/values.yaml
Et voilà! Everything else is the same as with a regular CSI driver.
Wrap-up, limitations, and disclaimer
Thanks to the open-source model of Kubernetes and Dell CSM, it’s easy to build and utilize them for many different architectures.
The CSI driver for PowerMax supports FBA devices via Fiber Channel and iSCSI. There is no support for CKD devices which require code changes.
The CSI driver for PowerMax allows CSI-compliant calls.
Note: Dell officially supports (through Github tickets, Service Requests, and Slack) the image and binary, but not the custom build.
Useful links
Stay informed of the latest updates of the Dell CSM eco-system by subscribing to:
Authors: Justin Bastin & Florian Coulombel
CSM 1.8 Release is Here!
Fri, 22 Sep 2023 21:29:12 -0000
|Read Time: 0 minutes
Introduction
This is already the third release of Dell Container Storage Modules (CSM)!
The official changelog is available in the CHANGELOG directory of the CSM repository.
CSI Features
Supported Kubernetes distributions
The newly supported Kubernetes distributions are :
- Kubernetes 1.28
- OpenShift 4.13
SD-NAS support for PowerMax and PowerFlex
Historically, PowerMax and PowerFlex are Dell’s high-end and SDS for block storage. Both of these backends recently introduced support for software defined NAS.
This means that the respective CSI drivers can now provision PVC with the ReadWriteMany access mode for the volume type file. In other words, thanks to the NFS protocol different nodes from the Kubernetes cluster can access the same volume concurrently. This feature is particularly useful for applications, such as log management tools like Splunk or Elastic Search, that need to process logs coming from multiple Pods.
CSI Specification compliance
Storage capacity tracking
Like PowerScale in v1.7.0, PowerMax and Dell Unity allow you to check the storage capacity on a node before deploying storage to that node. This isn't that relevant in the case of shared storage, because shared storage generally will always show the same capacity to each node in the cluster. However, it could prove useful if the array lacks available storage.
Using this feature, an object from the CSIStorageCapacity type is created by the CSI driver in the same namespace as the CSI driver, one for each storageClass.
An example:
kubectl get csistoragecapacities -n unity # This shows one object per storageClass.
Volume Limits
The Volume Limits feature is added to both PowerStore and PowerFlex. All Dell storage platforms now implement this feature.
This option limits the maximum number of volumes to which a Kubernetes worker node can connect. This can be configured on a per-node basis, or cluster-wide. Setting this variable to zero disables the limit.
Here are some PowerStore examples.
Per node:
kubectl label node <node name> max-powerstore-volumes-per-node=<volume_limit>
For the entire cluster (all worker nodes):
Specify maxPowerstoreVolumesPerNode or maxVxflexVolumesPerNode in the values.yaml file upon Helm installation.
If you opted-in for the CSP Operator deployment, you can control it by specifying X_CSI_MAX_VOLUMES_PER_NODES in the CRD.
Useful links
Stay informed of the latest updates of the Dell CSM eco-system by subscribing to:
- The Dell CSM Github repository
- Our DevOps & Automation Youtube playlist
- Slack (under the Dell Infrastructure namespace)
- Live streaming on Twitch
Author: Florian Coulombel