Infrastructure servers provide non-compute services in the cluster, including administration and user access. These servers typically fall into two roles: management and login nodes. The exact configuration and number of infrastructure servers depend on the cluster size and requirements. Although login servers are not required, separating users from critical management systems simplifies administration and minimizes unplanned downtime. For example, a typical system has one login server for every 30 to 100 users. Infrastructure nodes can also provide additional services, such as NFS storage service.
The baseline configurations for head and control plane nodes are based on the PowerEdge R760. Density is not a concern because management nodes constitute a small fraction of the overall cluster, allowing the use of regular 1U or 2U systems. Typically, clusters have matching platform architectures for both infrastructure and compute servers to ease administration. Login nodes specifically benefit from being of the same architecture as application nodes.
The following table provides the recommended minimum configuration for the management head node and the control plane node:
Component | Head node and control plane nodes |
Server model | 2x PowerEdge R760 |
CPU | 2x Intel Xeon Platinum 8468 Processor |
Memory | 512 GB, 16x 32 GB 4800 MTs |
Operating system | Ubuntu 22.04 |
RAID controller | PERC H755 RAID 6 |
Storage | Local: 10x 960 GB SATA |
Network | Broadcom BCM57414 NetExtreme-E 10 GB |
Consider the following recommendations for configuring the Omnia control plane:
- A dedicated PowerEdge server for Omnia is recommended, as the deployment of Omnia requires a control plane.
- Since the Omnia node does not require heavy computing, a single-processor server is sufficient, although a dual-processor server was used for this study.
- Omnia generally uses NFS, minimizing the need for local storage. However, you can customize the server configuration for additional local user storage space by expanding the drive quantity and RAID type. NVMe drives are available for higher performance.
- This release of Omnia does not support installing and configuring Slurm as part of the deployment process. For this project, PMIx and Slurm were compiled and installed from source.