Home > Workload Solutions > Data Analytics > Guides > Architecture Guide—Dell EMC HPC Ready Architecture for AI and Data Analytics > Playbooks
Dell Technologies’ Ansible-based deployment installs and deploys the Kubernetes Web UI (Dashboard) for graphical management of your cluster, as shown in the following figure:
Figure 4. Kubernetes management using Dashboard
Dashboard enables you to manage and monitor nodes, pods, services, persistent volumes, and more from a single interface. You can also run scripts that enable HPC nodes to be converted to Kubernetes nodes and back again for flexible workload scheduling on a single pool of resources. To enable dynamic scheduling, run the partition-switching script from the CLI. Ansible deploys the cluster similarly to the Bright deployment. The cluster will have both Slurm and Kubernetes personalities.
Ansible playbooks provide a convenient way to install the software for the HPC Ready Architecture for AI and Data Analytics by using PowerEdge servers with factory-installed CentOS images. Playbooks are available on GitHub (https://github.com/dellhpc/omnia) and can be used to prepare the following server building blocks:
The following tables provide details about the Ansible playbook environment:
Table 6. Ansible playbook environment capabilities
Capability |
Technology |
Container runtime with accelerator support |
Docker/containerd |
Container orchestration |
Kubernetes |
System monitoring |
Prometheus |
CNI-compliant software-defined network (SDN) |
Flannel and Calico |
Service discovery |
CoreDNS |
Ingress and proxy |
Nginx |
Table 7. Ansible playbook environment components
Component |
Version |
Operating system |
CentOS 7.6 |
Kubernetes |
1.16 |
Docker |
1.13 |
Helm |
3.0.1 |
To prepare systems for software deployment by using Ansible, ensure that each server is racked, powered, and networked so that the server can download software packages from the Internet or from a full CentOS mirror.
The HPC Ready Architecture for AI and Data Analytics assumes that you have two networks:
Hosting the management network and the high-speed fabric on two separate, private IP spaces is a best practice. As an example, the management network might use 192.168.x.x, while the high-speed fabric uses 10.1.x.x. Also, assign hostnames to systems. You can assign both names and IP addresses manually or by using Dynamic Host Configuration Protocol (DHCP).
The Ansible playbooks assume that, at a minimum, each node has SSH access on the high-speed fabric. Ansible uses the IP addresses of the high-speed fabric to establish the SDN for the Kubernetes installation.
Ansible uses roles to customize installation on different servers. Each server is given a specific role using the Ansible inventory file. Servers can be master nodes, unaccelerated compute nodes, or accelerated compute nodes.
The master role must list a single node to be used for Slurm and Kubernetes scheduling and orchestration, as well as for managing and monitoring the system. It does not require any accelerators and is not used for compute work.
List unaccelerated (that is, CPU-only) compute nodes in an inventory file. Because GPU enablement is as an add-on to the basic compute node provisioning process, list GPU-accelerated compute nodes in the unaccelerated section of the inventory file.
Here is a sample inventory file:
[master]
master
[compute]
compute[000:005]
[gpus]
compute001
compute002
### DO NOT EDIT BELOW THIS LINE ###
[workers:children]
compute
gpus
[cluster:children]
master
workers
You can install Ansible on the master node by using the yum package manager (as root):
yum install ansible
To download Ansible playbooks, go to the following GitHub page: https://github.com/dellhpc/omnia
When networking is set up, Ansible is installed on the master node and the inventory file is generated. Run the build-cluster.yml playbook to deploy the cluster:
ansible-playbook -i host_inventory_file build-cluster.yml
The playbook installs all necessary dependencies on the master and compute nodes, and it ensures that nodes are joined to the Kubernetes cluster. The process takes approximately 30 minutes, depending on your Internet connection speed.
When the cluster is set up, you can install additional applications on the Kubernetes partition, by using Helm, and on the Slurm partition, by using yum.
The following examples show validation that the services are running on the cluster when you use this architecture:
Figure 5. Container runtime with accelerator support
Figure 6. SDN with Flannel/Calico
Figure 7. Flannel SDN running on an Ansible-deployed system
Figure 8. Helm chart configuration