Home > Workload Solutions > Data Analytics > White Papers > GenAI on Dell APEX File Storage for AWS using Databricks, Hugging Face, and MosaicML > Step 2: Prepare data
Fine tuning the dataset ensures preparation of data to suit the requirements of the model. Here we use torch vision’s dataset libraries to create synthetic dataset for training and testing; the datasets are stored in Dell APEX File Storage for AWS and accessed using S3A protocol.
The access setup and exposing endpoint of the storage clusters to compute cluster is provided by AWS. Make sure the necessary user authorization and authentication are in place.