Home > APEX > Storage > White Papers > Sentiment Analysis using Dell APEX File Storage for AWS and Amazon SageMaker > Step 3: Create the Training, Testing, and Validation sets
With Amazon SageMaker you can bring your own logic within Python scripts to be used for training. Here we will create training, validation, and test sets.
The IMDB dataset has already been divided into training and test, but it lacks a validation set. Create a validation set using an 80:20 split of the training data by using the validation_split argument below.
Note how we are specifying the Dell APEX File Storage share for the different sets.
Found 25000 files belonging to 2 classes.
Using 20000 files for training.
Found 25000 files belonging to 2 classes.
Using 5000 files for validation.
Found 25000 files belonging to 2 classes.
Here are some samples along with their target values.