This section will provide step-by-step guidance on implementing the outlined hardware and software solution, from initial setup to full-scale deployment, ensuring organizations can effectively leverage NVIDIA RAG on Dell Technologies hardware for their AI initiatives.
Summary of deployment steps
- Base Infrastructure and operating system
- Kubernetes
- Dell CSI Driver for PowerScale
- RAG Sample pipeline
Initial Setup and Configuration
- Infrastructure deployment
- Deploy Ubuntu and relevant drivers
- Ensure PowerEdge R760xa can communicate to PowerScale
- Kubernetes can be deployed with NVIDIA's Cloud Native Stack GitHub repository or with your preferred method.
Software Installation and Deployment
- Dell CSI Driver for PowerScale
- Verify you have the CSI driver installed in Kubernetes, which is unique for this solution. A deployment link can be found below. PowerScale will be used to store the RAG model persistently.
- https://dell.github.io/csm-docs/docs/csidriver/installation/helm/isilon/
- You have a Kubernetes storage class called "Isilon."
- Verify that you can deploy the workload using the Storage Class.
- RAG Sample pipeline
- https://docs.NVIDIA.com/ai-enterprise/rag-llm-operator/0.4.1/pipelines.html
- In this sample RAG pipeline, change StorageClass from "local-path" to "Isilon" in the following three files. See Appendix.
- pvc-embedding.yaml
- pvc-inferencing.yaml
- pvc-pgvector.yaml
- In the helmpipeline_app.yaml file, change the StorageClass from "local-path" to "Isilon." Change accessMode to ReadWriteMany. You must also add your NGC API Key for the password and the secret apiKey. See Appendix.
Scaling and Management
- Monitoring: Regularly check the status of your Kubernetes environment and the CSI driver.
- Upgrading/Updating: Keep your Kubernetes environment and the CSI driver up to date to ensure optimal performance and security.
- Backup and Disaster Recovery Planning: Regularly back up your data and have a disaster recovery plan to protect against data loss.
Redeployments and Expansions
- As your needs evolve, you may need to redeploy or expand your cluster. This could involve adding more nodes to your Kubernetes environment, expanding your storage with PowerScale, or scaling up your use of the RAG model.
- All components of this solution are designed with ease of scaling in mind. Utilizing Kubernetes can add hardware resources with little or no disruption to the running deployment.
- GPU resources are primarily consumed as concurrent or active inferences increase. Adding additional Servers with GPUs allows a cluster to expand, and demand grows. For specific scaling guidance, consult your Dell Technologies representatives.