Home > Storage > PowerScale (Isilon) > Industry Solutions and Verticals > Analytics > Deep Learning with Dell EMC Isilon > Isilon
The following Isilon hardware was used for the benchmarking in this document.
Item | Description |
Model | Isilon F800-4U-Single-256GB-1x1GE-2x40GE SFP+-48TB SSD |
Number of Nodes | 4 |
Number of Chassis | 1 |
Processors Per Chassis | 4 x Intel Xeon Processor E5-2697A v4 |
Raw Capacity Per Chassis | 192 TB (60 x 3.2 TB SSD) |
Throughput Per Chassis | Up to 15 GB/s |
Front-End Networking | 2 x 40GbE (QSFP+) per node |
Infrastructure Networking | 2 x InfiniBand QDR per node |
Storage Type | SSD |
Number of Chassis Possible | 36 |
OneFS version | 8.1.0.2 |
Note: The Isilon cluster used to validate this solution used InfiniBand for the back-end (infrastructure) networking. However, it is recommended that new Isilon deployments use 40GbE for the back-end networking, even for deep learning workloads.
The default and minimal configuration of Isilon is sufficient for excellent performance of the deep learning workloads tested in this document.
In a minimal configuration, there will be a single Isilon chassis containing four Isilon F800 nodes. Each Isilon node should have at least one 40 Gigabit Ethernet port connected to a front-end switch. In most situations, this will provide as much bandwidth as can be provided by the F800. However, for protection from switch failures, the 2nd 40 Gigabit Ethernet port should also be connected.
When NFS clients establish a mount to an Isilon cluster, the recommended best practice is to mount to a DNS name which the Isilon cluster will resolve to one of its IP addresses. These IP addresses are automatically assigned and reassigned to individual Isilon nodes to handle load balancing and node failures. This method of mounting is simple to administer since the mount host is the same for all nodes (e.g. isilon.example.com). However, when there are just a few high-throughput clients, one Isilon node may have significantly more client connections than others, leading to an imbalance and reducing performance. To prevent this problem, each client can be mounted to a different fixed IP address to force a good balance. For instance, compute node 1 connects to Isilon node 1, compute node 2 connects to Isilon node 2, etc. Note that although a specific client node connects to only a single Isilon node, an I/O request will typically involve the disks and caches in all nodes of the Isilon cluster.
If archive-class Isilon nodes are part of the Isilon cluster, it is recommended that NFS clients do not connect to them as their slower CPUs will reduce client performance. Even if the dataset is known to be stored on the archive-class nodes, it is recommended to access them via the F800 all-flash nodes with faster CPU and more RAM. In general, archive-class nodes do not need to be assigned front-end IP addresses as all I/O to them will go through the back-end network.
This section explains one method of configuring storage tiering for a typical DL workload. The various parameters should be changed according to the expected workload.
isilon-1# sysctl efs.bam.atime_enabled=1
isilon-1# sysctl efs.bam.atime_grace_period=86400000
Configuration is now complete. To test this configuration, follow these steps:
isilon-1# isi job jobs start SmartPools
isilon-1# isi status
isilon-1# isi get -D /ifs/data/imagenet-scratch/tfrecords/train-00000-of-01024
* IFS inode: [ 1,0,1606656:512, 2,2,1372672:512, 3,0,1338880:512 ]
* Disk pools: policy hot-tier(5) -> data target f800_48tb-ssd_256gb:2(2), metadata target f800_48tb-ssd_256gb:2(2)
isilon-1# ls -lu /ifs/data/imagenet-scratch/tfrecords/train-00000-of-01024
-rwx------ 1 1000 1000 762460160 Jan 16 19:32 train-00000-of-01024
isilon-1# touch -a -d '2018-01-01T00:00:00' /ifs/data/imagenet-scratch/tfrecords/*
isilon-1# ls -lu /ifs/data/imagenet-scratch/tfrecords/train-00000-of-01024
-rwx------ 1 1000 1000 762460160 Jan 1 2018 train-00000-of-01024
isilon-1# isi job jobs start SmartPools
isilon-1# isi status
isilon-1# isi get -D /ifs/data/imagenet-scratch/tfrecords/train-00000-of-01024
* IFS inode: [ 4,0,1061376:512, 5,1,1145344:512, 6,3,914944:512 ]
* Disk pools: policy cold-tier(9) -> data target h500_60tb_3.2tb-ssd_128gb:15(15), metadata target h500_60tb_3.2tb-ssd_128gb:15(15)
isilon-1# isi job jobs start SmartPools
isilon-1# isi status
isilon-1# isi get -D /ifs/data/imagenet-scratch/tfrecords/train-00000-of-01024
* IFS inode: [ 1,0,1606656:512, 2,2,1372672:512, 3,0,1338880:512 ]
* Disk pools: policy hot-tier(5) -> data target f800_48tb-ssd_256gb:2(2), metadata target f800_48tb-ssd_256gb:2(2)