Home > AI Solutions > Gen AI > Guides > Design Guide—Generative AI Digital Assistants in the Enterprise > Conceptual architecture
Similar to most distributed applications, our digital assistant solution follows the Model-View-Controller (MVC) design pattern. In addition to the LLM and information retrieval components, multiple components of the digital assistant solution are based on generative AI.
The following figure shows the key components of the digital assistant solution. They match the MVC pattern as follows:
Figure 2. Digital assistant conceptual architecture
While the solution consists of four core components (digital assistant rendering platform, orchestration and conversation management, information retrieval, and large language model), you draw the most benefit from the system if high-quality data sources are available, and the existing systems of record are integrated. These components are all described in the following sections.
A digital assistant rendering platform (also known as a digital human platform) provides the user-facing frontend of the overall solution, specifically the avatar, voice, and chat capabilities. While the avatar with voice (and additional text chat) capabilities provides the most immersive experience, customers can turn on and turn off different modalities, if needed. A text-only experience – effectively emulating a chatbot – might be desirable if the available network bandwidth between the client device and the digital assistant solution is low.
This component interacts with the orchestration and conversation management component using a speech-to-text component to submit requests provided by the user to the overall solution. Conversely, a text-to-speech component delivers the solution responses back to the user.
Orchestration and conversation management functionality is at the core of the solution and ties the different software subsystems together. It consists of the following key components:
Translation—The digital assistant solution can be equipped with the ability to translate text from one language to another. The task involves understanding the semantics and context of the source language and accurately reproducing the meaning in the target language. This process is crucially important in multinational
Information retrieval (IR) systems, sometimes referred to as knowledge bases, are designed to manage, process, and retrieve information from a vast array of data sources. They ingest data, maintain either a copy or reference to the original documents, and store this information for efficient and accurate retrieval. These systems are fundamental to many applications, including search engines, recommendation systems, and digital libraries, among others.
The primary goal of an IR system is to provide authoritative information, minimizing or eliminating hallucination. Hallucination is the generation of information that is not grounded in the source data. By ensuring that the retrieved information is accurate, relevant, and based on the original documents, IR systems enhance the reliability and credibility of the results.
IR systems employ sophisticated algorithms and techniques to index, search, and retrieve data, ensuring that the digital assistant solution can find the most relevant information that it needs from the vast digital landscape. By providing quick and easy access to this information, IR systems help find the most relevant information for a given query from thousands of documents that are ingested from various data sources.
Data sources represent the authoritative, domain-specific data that an enterprise owns and that must be ingested into the IR system. Typical examples of data sources are:
The original content continues to exist in addition to the digital assistant solution. For example, a website continues to exist even if the digital assistant solution makes this content more accessible to a broad user base.
Large language models (LLMs) are advanced natural language processing models that use deep learning techniques to understand and generate human language. LLMs can include a range of architectures and approaches, such as recurrent neural networks and transformers. In particular, Generative Pre-trained Transformer (GPT) is a popular and influential example of an LLM that is based on the transformer architecture, which is a deep neural network architecture designed to handle sequential data efficiently. Transformers use self-attention mechanisms to process input sequences and learn contextual relationships between words, enabling them to generate coherent and contextually relevant language.
In the digital assistant solution, we follow a RAG approach to improve accuracy and relevance. Hence, the LLM plays a slightly reduced role (in comparison to pure LLM-based approaches) as the LLM is given several prompts along with factual information that has been obtained from the IR system. The purpose of the LLM is to provide succinct summaries of the obtained information so that:
There are multiple LLMs used across the solution, some embedded in other components of the overall solution, and another to provide for the dialog management. The following considerations apply to each LLM.
A foundation or pretrained model is a machine learning model that has been trained on a large dataset for a specific task before it is fine-tuned or adapted for a more specialized task. Foundation models are crucial because they provide a starting point that already understands a broad range of concepts and language patterns. Beginning with a pretrained foundation model makes the process of customizing and fine-tuning for specific tasks more effective and efficient.
Parameters in LLMs refer to the learnable components or weights of the neural network that make up the model. These parameters determine how the model processes input data and makes predictions or generates output. Typically, GPTs are measured in billions of parameters. These parameters are learned during the training process, in which the model is exposed to vast amounts of data and adjusts its parameters to generate language. Assuming that the model architecture and training data are comparable, generally the higher the parameters in the model, the greater the accuracy and capability of the models. Sometimes a smaller model that is trained to be specific to a particular outcome might be more accurate. Models with higher parameters also require more compute resources, especially GPU resources.
The accuracy of LLMs is typically measured based on their performance on specific NLP tasks. The evaluation metrics that are used depend on the nature of the task. The publicly available ChatGPT and Llama 2 models are foundation models. While these models offer a strong starting point with general capabilities, they are often customized for specific use. This design guide does not address model customization as we use the LLM with RAG.
Systems of record play a pivotal role in shaping the user experience and streamlining operations. These systems are domain-specific backend systems that establish context based on user-supplied information, facilitating personalized and efficient service delivery. They serve as the backbone of many business operations, providing critical data and functionality. Some examples of these systems include:
Integrating systems of records into the digital assistant experience represents a significant stride towards a more data-driven and customer-centric approach to business operations.