Home > Storage > PowerScale (Isilon) > Industry Solutions and Verticals > Analytics > Deep Learning with Dell EMC Isilon > Appendix B: Benchmark setup and execution
The procedure to download the dataset and build the TFRecord files is documented at https://github.com/tensorflow/models/tree/master/research/inception#getting-started. The repository for the benchmark used is https://github.com/alsrgv/benchmarks which has optimizations and required fixes when using Horovod to distribute training with TensorFlow. The hash of the commit used is 3b90c14, dated May 31, 2018. The CNN benchmark scripts in scripts/tf_cnn_benchmarks were used.
Before running any benchmark, the cache on the compute nodes were cleared with the command “sync; echo 3 > /proc/sys/vm/drop_caches”. Additionally, all caches on all Isilon nodes were cleared with “isi_for_array isi_flush”.
Since we use Horovod to distribute the training across multiple GPUs on multiple nodes, the mpirun command is used to run the training benchmark. Refer to https://github.com/uber/horovod/blob/8318a898a2f03fb22664ca8f5353361974f8a693/docs/benchmarks.md for details.
Below is an example command to run the training benchmark.
mpirun --allow-run-as-root --mca btl_openib_want_cuda_gdr 1 -np 16 \
python3.6 tf_cnn_benchmarks.py \
--model=resnet50 --datasets_num_private_threads=4 --distortions=False \
--variable_update=horovod --batch_size=256 --num_batches=2000 \
--data_dir=/mnt/isilon/tensorflow_10x --data_name=imagenet \
--train_dir=/mnt/Isilon/train_dir --init_learning_rate=.04 \
--use_fp16=True --use_tf_layers=False --print_training_accuracy
Below is the output from the above command. Note that many lines have been omitted for brevity. The last line shown is the training throughput in images per second.
TensorFlow: 1.8
Model: resnet50
Dataset: imagenet
Mode: training
SingleSess: False
Batch size: 4096 global
256.0 per device
Num batches: 2000
Num epochs: 0.64
Devices: ['horovod/gpu:0', 'horovod/gpu:1', 'horovod/gpu:2', 'horovod/gpu:3', 'horovod/gpu:4', 'horovod/gpu:5', 'horovod/gpu:6', 'horovod/gpu:7', 'horovod/gpu:8', 'horovod/gpu:9', 'horovod/gpu:10', 'horovod/gpu:11', 'horovod/gpu:12', 'horovod/gpu:13', 'horovod/gpu:14', 'horovod/gpu:15']
Data format: NCHW
Layout optimizer: False
Optimizer: sgd
Variables: horovod
==========
Generating model
2018-07-13 15:23:43.871841: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties:
name: Tesla V100-SXM2-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.53
pciBusID: 0000:1e:00.0
totalMemory: 15.78GiB freeMemory: 15.34GiB
2018-07-13 15:23:44.430647: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14849 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:1a:00.0, compute capability: 7.0)
Step Img/sec total_loss top_1_accuracy top_5_accuracy
1 2018-07-13 15:24:09 images/sec: 680.1 +/- 0.0 (jitter = 0.0) 9.788 0.000 0.004
1 2018-07-13 15:24:09 images/sec: 680.3 +/- 0.0 (jitter = 0.0) 9.003 0.000 0.004
1 2018-07-13 15:24:09 images/sec: 679.8 +/- 0.0 (jitter = 0.0) 8.518 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 680.0 +/- 0.0 (jitter = 0.0) 8.386 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.7 +/- 0.0 (jitter = 0.0) 8.585 0.000 0.008
1 2018-07-13 15:24:09 images/sec: 680.1 +/- 0.0 (jitter = 0.0) 9.207 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.7 +/- 0.0 (jitter = 0.0) 8.530 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.6 +/- 0.0 (jitter = 0.0) 8.213 0.000 0.008
1 2018-07-13 15:24:09 images/sec: 679.8 +/- 0.0 (jitter = 0.0) 8.494 0.000 0.004
1 2018-07-13 15:24:09 images/sec: 679.5 +/- 0.0 (jitter = 0.0) 12.650 0.008 0.008
1 2018-07-13 15:24:09 images/sec: 679.8 +/- 0.0 (jitter = 0.0) 8.375 0.000 0.008
1 2018-07-13 15:24:09 images/sec: 679.6 +/- 0.0 (jitter = 0.0) 9.301 0.008 0.016
1 2018-07-13 15:24:09 images/sec: 679.6 +/- 0.0 (jitter = 0.0) 8.332 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.6 +/- 0.0 (jitter = 0.0) 8.275 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.6 +/- 0.0 (jitter = 0.0) 8.658 0.000 0.000
1 2018-07-13 15:24:09 images/sec: 679.9 +/- 0.0 (jitter = 0.0) 8.506 0.000 0.008
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.9) 8.147 0.000 0.004
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.4) 8.227 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.8 (jitter = 3.4) 8.294 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 4.0) 8.398 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.6) 8.351 0.000 0.008
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.4) 8.270 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.8) 8.273 0.000 0.012
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.7) 8.142 0.000 0.004
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.3) 8.361 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.9) 8.293 0.000 0.004
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.3) 8.165 0.000 0.004
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.9) 9.824 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.4) 8.171 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.7) 8.413 0.004 0.008
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.9) 8.116 0.000 0.000
10 2018-07-13 15:24:12 images/sec: 693.3 +/- 1.9 (jitter = 3.4) 8.146 0.000 0.004
2000 2018-07-13 15:36:25 images/sec: 695.3 +/- 0.1 (jitter = 6.0) 4.880 0.223 0.410
total images/sec: 11123.80