Generative AI workloads can be broadly categorized into two types: training and inferencing. Training uses a large dataset of examples to train a generative AI model, while inference uses a trained model to generate new content based on an input. Data preparation before training can also be a significant task in creating custom models. All these workloads have characteristics that must be considered in the design of solutions and their infrastructure.
The characteristics of a generative AI workload can vary depending on the specific application and the type of model being used. However, some common characteristics include:
- Compute intensity—Generative AI workloads can be computationally intensive, requiring significant amounts of processing power to train or generate new content. This scenario particularly applies to large-scale models such as GPT-3, which can require specialized hardware such as GPUs to train efficiently.
- Memory requirements—Generative AI models require significant amounts of memory to store the model parameters and intermediate representations. This scenario particularly applies to transformer-based models such as GPT-3, which have many layers and can require hundreds of millions or even billions of parameters. Therefore, having sufficient GPU memory capacity is key.
- Data dependencies—Generative AI models are highly dependent on the quality and quantity of training data, which can greatly affect the performance of the model. Data preparation and cleaning are important parts of a solution as tapping into large, high-quality datasets is key to creating custom models.
- Latency requirements—Inference workloads might have strict latency requirements, particularly in real-time applications such as chatbots or voice assistants. Models must be optimized for inference speed, which can involve techniques such as model quantization or pruning. Latency considerations also favor on-premises or hybrid solutions, as opposed to purely cloud-based solutions, to train and infer from models closest to the source of the data.
- Model accuracy—The accuracy and quality of the generated content is a critical outcome for many generative AI applications, and is typically evaluated using metrics such as perplexity, bilingual evaluation understudy (BLEU) score, or human evaluation.
Overall, generative AI workloads can be highly complex and challenging, requiring specialized hardware, software, and expertise to achieve optimal outcomes. However, with the right tools and techniques, they can enable a wide range of exciting and innovative applications in fields such as NLP, computer vision, and creative arts.