There are several specific types of generative AI workloads; each has different requirements. The system configurations described later in this white paper reflect these requirements.
Inferencing is the process of using a generative AI model to generate new predictive content based on input. A pretrained model is trained on a large dataset, and when new data is fed into the model, it makes predictions based on what it has learned during training. This training involves feeding an input sequence or image into the model and receiving an output sequence or image as the result. Inferencing is typically faster and less computationally intensive than training because it does not involve updating the model parameters.
Pretrained model customization is the process of retraining an existing generative AI model for task-specific or domain-specific use cases. For large models, it is more efficient to customize than to train the model on a new dataset. Customization techniques in use today include fine-tuning, instruction tuning, prompt learning (including prompt tuning and P-tuning), reinforcement learning with human feedback, transfer learning, and use of adapters (or adaptable transformers).
The most useful types of customizations are fine-tuning, prompt learning, and transfer learning.
Fine-tuning retrains a pretrained model on a specific task or dataset, adapting its parameters to improve performance and make it more specialized. This traditional method of customization either freezes all but one layer and adjusts weights and biases on a new dataset or adds another layer to the neural network and re-recalculates weights and biases on a new dataset.
Prompt learning is a strategy that allows pretrained language models to be repurposed for different tasks without adding new parameters or fine-tuning with labeled data. These techniques can also be used on large generative AI image models.
Prompt leaning can be further categorized into two broader techniques: prompt tuning and P-tuning.
Transfer learning is a traditional technique for using pretrained generative AI models to accelerate training on new datasets. This technique starts with a pretrained model that has already learned useful features from a large dataset, and then adapts it to a new dataset with a smaller amount of training data. It can be much faster and more effective than training a model initially on the new dataset because the pretrained model already understands the underlying features of the data. Transfer learning is useful when there is limited training data available for a new task or domain. Transfer learning is not typically used for generative AI LLMs but is effective with general AI models.
In this solution design, the configurations related to customization are optimized for fine-tuning and P-tuning. However, the scalability and overall architecture design considerations still apply to other customization techniques and for datasets other than text.
Training is the process of using a dataset to train a generative AI model initially. Training feeds the model examples from the dataset and adjusts the model parameters to improve its performance on the task. Training can be a computationally intensive process, particularly for large-scale models like GPT-3.
In an end-to-end workflow for generative AI, the exact sequence of these steps depends on the specific application and requirements. For example, a common workflow for LLMs might involve:
Transfer learning can also be used at various points in this workflow to accelerate the training process or improve the performance of the model. Overall, the key is to select the appropriate techniques and tools for each step of the workflow and to optimize the process for the specific requirements and constraints of the application.