Run:ai Atlas provides a fractional GPU sharing system for containerized workloads on Kubernetes. Fractional GPUs are suited for lightweight AI tasks such as inference and model development. The fractional GPU system transparently enables data science and AI engineering teams to run multiple workloads simultaneously on a single GPU. Therefore, companies can run more workloads such as computer vision, voice recognition and natural language processing on the same hardware to lower costs.
The fractional GPU system from Run:ai Atlas effectively creates logical GPUs with their own memory and computing space that containers can use and access as if they are self-contained processors. This method enables several workloads to run in containers side-by-side on the same GPU without interfering with each other. The solution is transparent, simple, and portable; it requires no changes to the containers themselves.
A typical use case might have two to eight jobs running on the same GPU, meaning that you can perform eight times the work with the same hardware.
We use the following commands to validate fractional GPUs:
runai config project team-a
runai submit frac05 -i gcr.io/run-ai-demo/quickstart -g 0.5 --interactive
runai submit frac03 -i gcr.io/run-ai-demo/quickstart -g 0.3
Figure 5. GPU allocation for GPU fractions
The figure shows that the GPU is running two jobs simultanelously and the overall GPU use of 80 percent (50 percent for the frac05 job and 30 percent for the frac03 job).