Domino Data Science Platform uses pools to provide self-service access to the heterogenous blend of hardware configurations that are used in data science. Generally, there are a minimum of two pools: one for CPU workloads and another for GPU workloads. Additional pools can be created in the compute grid as necessary to provide specialized execution hardware. In the Design for Domino Data Science Platform, we created three pools: a default pool, a GPU pool, and a CPU pool.
To size the compute grid for this design, we first established the criteria to determine which pool is more suited to a specific workload:
- The default pool is best suited for hosting simple machine learning models behind an API, processing for scheduled reports, and launching interactive workspaces for development environments such as Visual Studio Code or a Jupyter Notebook.
- GPU pools are primarily used to train model architectures that benefit from the acceleration that is provided with a GPU. They can also be used to host accelerated inferencing applications in the form of Published Apps and Model API.
- The CPU pool is used to satisfy requests for large amounts of CPU and memory resources. The CPU pool has higher class processors with more memory and cores available than the default pool.
After determining the pool assignment of each type of workload, we then determine the expected number of instances of each workload and how long they are expected to run. To size the Design for Domino Data Science Platform, we assumed that our data scientists each work in five separate teams of four data scientists. We then estimated the average blend of workloads that a typical enterprise data science team runs. To finalize the workload definition, we also incorporate these assumptions:
- Half of all data scientists are logged in to development environments running in the default pool at any time. Average resource consumption is four cores and 8 GB of memory for each data scientist.
- Each of the five data science teams has a dashboard, model API, and launcher in place that require an average of four cores and 16 GB of memory each.
- Each of the five data science teams need to run three model training jobs on GPU hardware each week for hyperparameter tuning and candidate selection.
- Traditional machine learning techniques along with data processing, exploration, and visualization are common to every data science project and require a large amount of memory. Sizing these workloads is difficult, however. For example, CPU oversubscription is desirable to reduce cost and can lead to increased overall utilization of the CPU, but memory oversubscription results in latency and spilling the contents to disk.
Our default pool needs 100 cores and 320 GB of memory to support user workloads. Development environments and web applications are not processing 100 percent of the time. Oversubscription might be possible after monitoring the utilization metrics of the hosts in the pool by using tools like perf, which is included with Red Hat Linux. Although the host operating system and services running on the host require a few resources, our evaluation determined that this minor overhead was unlikely to affect these workloads.
Our GPU workload requirement is to support three separate training jobs for each team, each week. By having three NVIDIA V100 GPUs available in the compute grid, we enable approximately 100 hours of GPU runtime for each team per week or 33 hours per model candidate.
The CPU pool is characterized by its high core count and the large amount of available memory. Because CPU resources can be assigned per core, we first tried to determine the number of cores and memory each team might need to size the CPU pool. This determination proved difficult to model for several reasons:
- The exact ratio of cores to memory is not static and varies widely.
- The relationship to memory, cores, and latency is complicated by NUMA, PCI topology, and other factors that are difficult to manage when you are granularly partitioning a system with 56 cores across two sockets and 12 memory channels.
We opted to use a single PowerEdge C6420 chassis. Unlike the GPU pool in which resource contention is binary (a GPU is either available or unavailable), CPU resource contention is a tradeoff of several metrics and more in-depth evaluation of the specific blend of workloads appropriate for the default pool and CPU pool.
In practice, the requirements of each team are unlikely to be uniform. For example, full model training might take 90 hours to converge but only occur once a month. Domino Data Science Platform provides the ability to report resource utilization, which can help determine if you need to increase the size of a compute pool to meet user demand.