Deep learning workloads can be two generic types:
- Interactive "build" sessions─The data scientist opens an interactive session using bash, Jupyter Notebook, remote PyCharm, or a similar tool to access GPU resources directly.
- Unattended "training" sessions─The data scientist prepares a self-running workload and sends it for execution. During the execution, the data scientist can examine the results.
To run an unattended training session, we used the runai submit command. We ran the training using the runai-quickstart image:
runai submit qf-a2 -i gcr.io/run-ai-demo/quickstart -g 1
Figure 4. GPU allocation and utilization for a training job
The figure shows that one GPU is allocated to the job and is fully used by the training job.