Protecting Dell AI Factory RAG with Dell PowerProtect Data Manager and Data Domain
Download PDFTue, 13 Aug 2024 20:23:28 -0000
|Read Time: 0 minutes
Summary
Artificial Intelligence (AI) is transforming industries by enhancing efficiency and providing a strategic advantage. Yet, mastering AI's potential comes with its own set of challenges, such as bridging the skill gap, managing data sovereignty, and the expenses involved in deploying and scaling AI enterprise-wide.
As we enter the AI era, the limitations of conventional IT infrastructures become apparent, necessitating a specialized approach to satisfy the intensive needs of AI - enter the concept of the AI factory.
Mirroring the way physical factories were the cornerstone of the industrial revolution, AI factories are set to be the driving force behind the AI revolution, generating not tangible products but actionable intelligence, innovative content, and novel insights. The future of business hinges on the AI factory, with those proficient in establishing and utilizing these factories to yield swift, consistent results poised to thrive in the AI age.
Dell envisions the AI Factory as a means for organizations to consistently produce transformative results at scale, thereby securing a competitive edge. The Dell AI Factory represents our commitment to fostering AI innovation across businesses of all sizes, offering a suite of products, solutions, and services optimized for AI tasks, ensuring quick, consistent outcomes across diverse environments, including cloud, data centers, workstations, AI PCs, and edge locations.
AI workloads will present important considerations for data protection that should be addressed early in the planning and design phase of these solutions. Data protection use cases include, but are not limited to, the following.
- Cyber resiliency and system protection in the event of an attack or other disruptive event
- Compliance for long term retention of sensitive data
- Legal protection against IP infringement, use of PII, proof of prompt and response, copy right violation for model and training data
- Workload state to be able to restore to a previous state for performance or re-vectoring
- Dataset reconstruction for consolidation of training data from multiple sources
RAG (Retrieval-Augmented Generation) is a sophisticated process in generative AI that optimizes the output of a large language model. It does this by referencing an authoritative knowledge base outside of its training data sources before generating a response. This technique enhances the performance of the model by integrating retrieval-based methods with language generation, thereby improving the quality and relevance of the generated content. Within the Dell AI Factory, RAG is instrumental in enhancing AI applications across a wide range of environments by ensuring the responses are not only contextually accurate but also validated with external authoritative sources.
Dell's PowerProtect Data Manager (PPDM) stands at the forefront of data protection solutions.It offers: a robust and agile platform for safeguarding critical applications, the RAG AI application operating within an AI factory environment, PPDM provides a seamless integration with Kubernetes including Red Hat OpenShift, ensuring that Kubernetes workloads are comprehensively protected – both stateful and stateless components. PPDM Kubernetes integration facilitates the management and recovery of data across physical, virtual, and cloud environments, addressing the complexities of IT growth. PPDM's capabilities extend to multicloud data protection, offering backup to cloud, backup in-cloud, cyber recovery and cloud disaster recovery, thus ensuring compliance and meeting stringent service level objectives. The software-defined data protection platform is designed for operational simplicity, agility, and flexibility, enabling businesses to rapidly adapt to future IT demands while safeguarding their data's value.
PowerProtect Data Manager offers robust cyber resilience features, ensuring that data remains secure and recoverable in the event of a cyber incident. Its compliance capabilities are designed to meet stringent regulatory requirements, providing peace of mind that data governance policies are being followed. The immutability features, including Retention Lock Governance (RLG) and Retention Lock Compliance (RLC), prevent data from being altered or deleted. This ensures that it remains intact and unchangeable for a specified period., While RLC is more rigid, offering no option to revert once retention is set: RLG allows for flexible retention policies, thus providing an additional layer of data protection. These features collectively contribute to a comprehensive data protection strategy, safeguarding critical data assets against a variety of threats.
RAG Protection using PowerProtect Data Manager
Figure 1. RAG AI Data Protection using PowerProtect Data Manager – Overview
Data Protection Operations for RAG
Figure 2. Configuration Flow for RAG AI Data Protection in PowerProtect Data Manager
Protection
PowerProtect Data Manager features a comprehensive data protection solution for Kubernetes environments which includes protection of Kubernetes resources, as well as persistent data (PVCs provisioned by a Container Storage Interface (CSI)). Any RAG-based application is comprised of stateless and stateful components which PPDM can protect.
PowerProtect Data Manager (PPDM) offers a robust data protection solution tailored for Kubernetes environments, ensuring the safety of both Kubernetes resources and persistent volumes (PVCs). This comprehensive coverage is crucial for applications utilizing Retrieval-Augmented Generation (RAG), which typically consist of both stateless and stateful components. Stateless components, often responsible for the retrieval aspect, do not retain data between sessions, whereas stateful components, crucial for the generation process, require persistent storage to maintain information over time. PPDM's ability to safeguard these components means that it can provide backup and recovery for the entire RAG-based application stack, from the ephemeral to the enduring elements. This dual capability ensures that the dynamic data and the more static (and persistent data) are both protected against data loss or corruption. Furthermore, PPDM's integration with Kubernetes allows for a seamless management experience, enabling IT operations and backup administrators to define and enforce data protection policies directly through Kubernetes APIs. This integration simplifies the protection of complex, distributed applications and aligns with modern DevOps practices, offering a streamlined approach to data governance, monitoring, and recovery in Kubernetes-centric infrastructures.
PowerProtect discovers all resources and CSI PVCs in the Kubernetes cluster to enable protection on a namespace basis where PVCs can be excluded from protection. In our case, the two relevant namespaces rag-sample and rag operator were protected by the same protection policy with no exclusions. PowerProtect Data Manager employs a very lightweight and efficient backup process by dynamically deploying and tearing down the data mover pod (cProxy) during protection.
Recovery
PowerProtect Data Manager offers an extensive array of restore options to ensure data resilience and recovery across diverse environments, including on-premises and cloud deployments. The RAG data protection solution has undergone validation for all Kubernetes (k8s)-related restore options, as follows:
RAG Recovery Option # | Restore Options | Details |
Option 1 | Restore to original cluster | Production namespace |
Option 2 | Restore to original cluster | Existing namespace |
Option 3 | Restore to original cluster | New namespace |
Option 4 | Alternate cluster | Existing namespace. Same or different k8s distribution |
Option 5 | Alternate cluster | New namespace. Same or different k8s distribution |
The Quick Recovery feature is particularly noteworthy; it enables rapid restoration of critical data across environments and distance, minimizing return time objectives (RTOs). This is achieved by sending metadata from the source system to the destination system, which allows for a swift recovery view and workload restoration at a remote site even before the source system is fully restored. RAG data protection has been validated for Quick Recovery, enabling the following options:
RAG Recovery Option # | Restore Options | Details |
Option 6 | Alternate cluster (Quick Recovery) | Existing namespace. Same or different k8s distribution |
Option 7 | Alternate cluster (Quick Recovery) | New namespace. Same or different k8s distribution |
As Generative AI solutions proliferate, data protection will continue to grow in importance to ensure the security of the systems and organizations alike. Dell’s ability to include data protection as an integrated component of the AI Factory infrastructure stack ensures organizations can deploy their solutions confidently and securely.