Home > Workload Solutions > Data Analytics > White Papers > GenAI on Dell APEX File Storage for Azure using Databricks, HuggingFace, and MosaicML > Solution Validation MosaicML LLM model training
This section discusses the step-by-step process of validating training of a model from MosaicML Open-source library. This section will also provide validation for image recognition and classification using Rest Net architecture, which is a deep convolution neural network. This section uses TesNet-56, with 56 being the weight layer. For more information, read on the Rest Net and CNN.
After establishing the solution and connectivity, a Databricks computer cluster is initiated with the necessary resources for training. Create a new notebook and install required libraries, such as composer, from MosaicML and other python libraries. Torch is installed for use in training.
Fine-tuning the dataset ensures data preparation to meet the model's requirements. Use TorchVision’s dataset libraries for validation and to create synthetic datasets for training and testing, which are stored in the Dell APEX File Storage clusters. These datasets are stored in the Dell APEX File Storage clusters.
The dataset is placed in Dell APEX File Storage for Azure; the validated data access pattern is S3A. Azure provides the access setup and exposes the storage clusters’ endpoint to the Compute cluster. Ensure appropriate handling of necessary user authorization and authentication.
Set up the resnet_56 model from the composer library and modify the required parameters. The team has stored the fine-tuned data in Dell APEX File Storage. The Spark cluster accesses this data, and all input/output operations for the compute and storage cluster are managed through the Spark distributed computation framework, using the S3A protocol.
After the setup is complete, run the training job.