IIoT Analytics Design: How important is MOM (message-oriented middleware)?
Wed, 29 Apr 2020 22:20:16 -0000|
Read Time: 0 minutes
Originally published on Aug 6, 2018 1:17:46 PM
Artificial intelligence (AI) is transforming the way businesses compete in today’s marketplace. Whether it’s improving business intelligence, streamlining supply chain or operational efficiencies, or creating new products, services, or capabilities for customers, AI should be a strategic component of any company’s digital transformation.
Deep neural networks have demonstrated astonishing abilities to identify objects, detect fraudulent behaviors, predict trends, recommend products, enable enhanced customer support through chatbots, convert voice to text and translate one language to another, and produce a whole host of other benefits for companies and researchers. They can categorize and summarize images, text, and audio recordings with human-level capability, but to do so they first need to be trained.
Deep learning, the process of training a neural network, can sometimes take days, weeks, or months, and effort and expertise is required to produce a neural network of sufficient quality to trust your business or research decisions on its recommendations. Most successful production systems go through many iterations of training, tuning and testing during development. Distributed deep learning can speed up this process, reducing the total time to tune and test so that your data science team can develop the right model faster, but requires a method to allow aggregation of knowledge between systems.
There are several evolving methods for efficiently implementing distributed deep learning, and the way in which you distribute the training of neural networks depends on your technology environment. Whether your compute environment is container native, high performance computing (HPC), or Hadoop/Spark clusters for Big Data analytics, your time to insight can be accelerated by using distributed deep learning. In this article we are going to explain and compare systems that use a centralized or replicated parameter server approach, a peer-to-peer approach, and finally a hybrid of these two developed specifically for Hadoop distributed big data environments.
Distributed Deep Learning in Container Native Environments
Container native (e.g., Kubernetes, Docker Swarm, OpenShift, etc.) have become the standard for many DevOps environments, where rapid, in-production software updates are the norm and bursts of computation may be shifted to public clouds. Most deep learning frameworks support distributed deep learning for these types of environments using a parameter server-based model that allows multiple processes to look at training data simultaneously, while aggregating knowledge into a single, central model.
The process of performing parameter server-based training starts with specifying the number of workers (processes that will look at training data) and parameter servers (processes that will handle the aggregation of error reduction information, backpropagate those adjustments, and update the workers). Additional parameters servers can act as replicas for improved load balancing.
Worker processes are given a mini-batch of training data to test and evaluate, and upon completion of that mini-batch, report the differences (gradients) between produced and expected output back to the parameter server(s). The parameter server(s) will then handle the training of the network and transmitting copies of the updated model back to the workers to use in the next round.
This model is ideal for container native environments, where parameter server processes and worker processes can be naturally separated. Orchestration systems, such as Kubernetes, allow neural network models to be trained in container native environments using multiple hardware resources to improve training time. Additionally, many deep learning frameworks support parameter server-based distributed training, such as TensorFlow, PyTorch, Caffe2, and Cognitive Toolkit.
Distributed Deep Learning in HPC Environments
High performance computing (HPC) environments are generally built to support the execution of multi-node applications that are developed and executed using the single process, multiple data (SPMD) methodology, where data exchange is performed over high-bandwidth, low-latency networks, such as Mellanox InfiniBand and Intel OPA. These multi-node codes take advantage of these networks through the Message Passing Interface (MPI), which abstracts communications into send/receive and collective constructs.
Deep learning can be distributed with MPI using a communication pattern called Ring-AllReduce. In Ring-AllReduce each process is identical, unlike in the parameter-server model where processes are either workers or servers. The Horovod package by Uber (available for TensorFlow, Keras, and PyTorch) and the mpi_collectives contributions from Baidu (available in TensorFlow) use MPI Ring-AllReduce to exchange loss and gradient information between replicas of the neural network being trained. This peer-based approach means that all nodes in the solution are working to train the network, rather than some nodes acting solely as aggregators/distributors (as in the parameter server model). This can potentially lead to faster model convergence.
The Dell EMC Ready Solutions for AI, Deep Learning with NVIDIA allows users to take advantage of high-bandwidth Mellanox InfiniBand EDR networking, fast Dell EMC Isilon storage, accelerated compute with NVIDIA V100 GPUs, and optimized TensorFlow, Keras, or Pytorch with Horovod frameworks to help produce insights faster.
Distributed Deep Learning in Hadoop/Spark Environments
Hadoop and other Big Data platforms achieve extremely high performance for distributed processing but are not designed to support long running, stateful applications. Several approaches exist for executing distributed training under Apache Spark. Yahoo developed TensorFlowOnSpark, accomplishing the goal with an architecture that leveraged Spark for scheduling Tensorflow operations and RDMA for direct tensor communication between servers.
BigDL is a distributed deep learning library for Apache Spark. Unlike Yahoo’s TensorflowOnSpark, BigDL not only enables distributed training - it is designed from the ground up to work on Big Data systems. To enable efficient distributed training BigDL takes a data-parallel approach to training with synchronous mini-batch SGD (Stochastic Gradient Descent). Training data is partitioned into RDD samples and distributed to each worker. Model training is done in an iterative process that first computes gradients locally on each worker by taking advantage of locally stored partitions of the training data and model to perform in memory transformations. Then an AllReduce function schedules workers with tasks to calculate and update weights. Finally, a broadcast syncs the distributed copies of model with updated weights.
The Dell EMC Ready Solutions for AI, Machine Learning with Hadoop is configured to allow users to take advantage of the power of distributed deep learning with Intel BigDL and Apache Spark. It supports loading models and weights from other frameworks such as Tensorflow, Caffe and Torch to then be leveraged for training or inferencing. BigDL is a great way for users to quickly begin training neural networks using Apache Spark, widely recognized for how simple it makes data processing.
One more note on Hadoop and Spark environments: The Intel team working on BigDL has built and compiled high-level pipeline APIs, built-in deep learning models, and reference use cases into the Intel Analytics Zoo library. Analytics Zoo is based on BigDL but helps make it even easier to use through these high-level pipeline APIs designed to work with Spark Dataframes and built in models for things like object detection and image classification.
Regardless of whether you preferred server infrastructure is container native, HPC clusters, or Hadoop/Spark-enabled data lakes, distributed deep learning can help your data science team develop neural network models faster. Our Dell EMC Ready Solutions for Artificial Intelligence can work in any of these environments to help jumpstart your business’s AI journey. For more information on the Dell EMC Ready Solutions for Artificial Intelligence, go to dellemc.com/readyforai.
Lucas A. Wilson, Ph.D. is the Chief Data Scientist in Dell EMC's HPC & AI Innovation Lab. (Twitter: @lucasawilson)
Michael Bennett is a Senior Principal Engineer at Dell EMC working on Ready Solutions.
Related Blog Posts
Live Optics is Your Friend
Tue, 05 Oct 2021 19:01:57 -0000|
Read Time: 0 minutes
It’s a rare day that a free tool exists that can help profile customer workloads to the mutual benefit of all. Live Optics (previously DPack) is a gem in the rough that is truly a win-win proposition for customers and vendors such as Dell. I’ve been using it for years and found that it’s a rare day that I don’t learn something of use.
The tool is similar to SAR on steroids. Data is collected for each host. Hosts can be VMs. Servers can be from any manufacturer. The data collected is on IOPS (size and amount), memory usage, CPU usage and network activity. It can be run in local mode where the data doesn’t go anywhere else or it can be stored in a Dell private cloud. The later is more beneficial as it may be accessed by folks in many roles for various assessments. The data may also be mined to help Dell make better decisions of current and future products based on actual observed user profiles.
I use LiveOptics to profile database workloads like Greenplum and Vertica, Hadoop, NoSQL databases like MongoDB, Cassandra, Marklogic and more.
Upon inspection of the workload the data collected helps facilitate more meaningful discussions with various SMEs and to right size future designs. In one case I found a customer that was using less than half their memory during peak periods…so we suggested new server BOMs with much less memory as they didn’t need what they had.
Can we help you with assessing your workloads of interest on our servers or those of our competitors?
Some links of interest
How about SingleStore for your database on 15G Dell PE Servers?
Thu, 30 Sep 2021 16:22:51 -0000|
Read Time: 0 minutes
How about SingleStore for your database on 15G Dell PE Servers?
Singlestore is a distributed relational database that was previously called MemSQL. It is well suited to analytics workloads. There are two data structure constructs available. First is the column store which is on disk. Disk is typically SSDs. Second is a row store that is in memory and essentially a key-value database. Yes you can have both types in the same db and join across the two different table types. Data for the column store is arranged in leaves where the low level detail is stored and aggregators which are summarized data structures. Clients use the aggregators for queries via SQL.
Singlestore uses the MySQL protocol which makes it compatible with anything that can connect to MySQL.
Customers choose this database when they have demanding high performance analytics needs. We have many large financial customers that are very happy with it.
So what does it look like with the latest 15G IceLake servers for Dell.
- Aggregators use single 25GbE NIC
- Leaf nodes use a single 10GbE NIC
- Aggregator nodes use about ¼ RAM & ¼ cores as leaf nodes
- RAID is optional but most customers elect it. The figures below assume RAID10.
- Use an m.2 BOSS card w/ a R1 pair of 480GB RI SSDs for the OS and software. They are now hot swappable.
- As for durability & cost reasons 99.99% of the time read intensive value SAS SSDs will be the right fit.
- 2 aggregators w/ 4 x 480GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 4 leaf nodes w/ 4 x 960GB RI SSD, 256GB RAM, 2 x 12c, 10GbE NIC
- 3 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 7 leaf nodes w/ 8 x 3.84TB RI SSD, 512GB RAM, 2 x 24c, 10GbE NIC
- 4 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
- 14 leaf nodes w/ 8 x 7.68TB RI SSD, 1024GB RAM, 2 x 28c, 10GbE NIC
If you need your Singlestore database on Dell PE Servers do let us know.