The following figure shows a typical workflow for generative AI development, depicting where inferencing fits into the overall work stream. While this process can vary from organization to organization, the basic flow is generally consistent.
Figure 1. Generative AI workflow
In step 1, the business must establish its strategy for generative AI by considering its goals and objectives, identifying the problems it wants to solve or opportunities to create, and defining the use case or cases to address.
Step 2 consists of data preparation and curation. It may include data cleansing and labeling, data aggregation, anonymizing of data or generation of synthetic data if necessary, and generally ensuring that the dataset is well-managed, high-quality, and readily available for model training and model customization. Software tools such as Machine Learning Operations (MLOps) platforms can help in the data preparation phase.
In step 3, the real work begins, especially if we are training a model from scratch, which requires a substantial amount of labeled data relevant to the use case, heavy computational resources, and potentially significant time for training. This step is where validated, high-performance, and accelerated infrastructure can make a significant difference in the time and efficiency to complete the training phase. We can also evaluate existing models and select a pretrained model if it is applicable to the business, or use a pretrained model as the basis for the next step of model customization.
Step 4 consists of customization of a trained model, whether it is one that you have trained from scratch or acquired as a pretrained model. Customization methods include fine-tuning, prompt learning that can include both prompt tuning and parameter tuning (P-tuning), transfer learning, and reinforcement learning. These methods are discussed in more detail in the white paper.
Step 5 consists of inferencing, the subject of this validated design. This step is where you deploy and operate the trained model to generate business outcomes on an ongoing basis, scaling up or scaling out the computing resources as necessary to match demands. The inferencing step may be iterative as well, as new data and new model customization and fine-tuning opportunities are identified to optimize the outcomes of the inferencing operations in practice.