For production deployments you must consider the following in order to support your intended use cases:
- Failure tolerance of workers and storage elements
- Storage tiering
- Network and virtual infrastructure availability
Resource dimensioning in this case is a range. For example:
- Worker nodes can range in number from two to seven depending upon S5248F-ON port saturation.
- Storage can range to three times the calculated use case needs based on the required Kubernetes replication factor.
- External object storage (for example, Dell ECS) is added to facilitate economical access to storage services.
- Switches are added for high availability (HA).
- Upgrade to denser SSDs in Worker or ECS nodes.
Note: Scale-up possibilities, like adding a GPU (such as a NVIDIA Ampere A100) per worker node for training performance, will be available in a future release of Predictive Maintenance for IT Operations.
Dell recommends that you start with four worker nodes, and then build up to your use case needs by hot-plugging in as necessary. The Dell Validated Design for Analytics — Data Lakehouse platform does allow such hot plug-ins of storage and compute elements. Dell calls this configuration a medium package.