Your Browser is Out of Date

Nytro.ai uses technology that works best in other browsers.
For a full experience use one of the browsers below

Dell.com Contact Us
United States/English

Blogs

Short articles about data analytics solutions and related technology trends

Blogs (29)

  • AI
  • data analytics
  • big data
  • PowerScale
  • data lakehouse

Navigating the modern data landscape: the need for an all-in-one solution

Vrashank Jain Vrashank Jain

Mon, 18 Mar 2024 19:56:59 -0000

|

Read Time: 0 minutes

There are two revolutions brewing inside every enterprise. We are all very familiar with the first one - the frenzied rush to expand an organization's AI capabilities, which leads to an exponential growth in data creation, a rise in availability of high-performance computing systems with multi-threaded GPUs, and the rapid advancement of AI models. The situation creates a perfect storm that is reshaping the way enterprises operate. Then, there is a second revolution that makes the first one a reality – the ability to harness this awesome power and gain a competitive advantage to drive innovation. Enterprises are racing towards a modern data architecture that seeks to bring order to their chaotic data environment.

The Need For An All-In-One Solution

Data platforms are constantly evolving, despite a plethora of options such as data lakes, data warehouses, cloud data warehouses and even cloud data lakehouses, enterprise are still struggling. This is because the choices available today are suboptimal.

Cloud native solutions offer simplicity and scalability, but migrating all data to the cloud can be a daunting task and can end up being significantly more expensive over the long term. Moreover, concerns about the loss of control over proprietary data, particularly in the realm of AI, is a major cause for concern, as well. On the other hand, traditional on-premises solutions require significantly more expertise and resources to build and maintain. Many organizations simply lack the skills and capabilities needed to construct a robust data platform in-house.

A customer once told me – “We’ve heard from so many vendors but ultimately there is no easy button for us.”

When Dell Technologies set out to build that easy button, we started with what enterprises needed most: infrastructure, software, and services all seamlessly integrated. We created a tailor-made solution with right-sized compute and a highly performant query engine that is pre-integrated and pre-optimized to perfectly streamline IT operations. We incorporated built-in enterprise-grade security that also can seamlessly integrate with 3rd party security tools. To enable rapid support, we staffed a bench of experts, offering end-to-end maintenance and deployment services. We also knew the solution needed to be future proof – not only anticipating future innovations but also accommodating the diverse needs of users today. To support this idea, we made the choice to use open data formats, which means an organization’s data is no longer locked-in to a proprietary format or vendor. To make the transition easier, the solution makes use of built-in enterprise-ready connectors that ensures business continuity. Ultimately, our goal was to deliver an experience that is easy to install, easy to use, easy to manage, easy to scale, and easy to future-proof.

Dell Data Lakehouse’s Core Capabilities

Let’s dig into each component of the solution.

  • Data Analytics Engine, powered by Starburst: A high performance distributed SQL query engine, built on top of Starburst, based on Trino, which can run fast analytic queries against data lakes, lakehouses and distributed data sources at internet-scale. It integrates global security with fine-grained access controls, supports ad-hoc and long-running ELT workloads and is a gateway to building high quality data products and power AI and Analytics workloads. Dell’s Data Analytics Engine also includes exclusive features that help dramatically improve performance when querying data lakes. Stay tuned for more info!
  • Data Lakehouse System Software: This new system software is the central nervous system of the Dell Data Lakehouse. It simplifies lifecycle management of the entire stack, drives down IT OpEx with pre-built automation and integrated user management, provides visibility into the cluster health and ensures high availability, enables easy upgrades and patches and lets admins control all aspects of the cluster from one convenient control center. Based on Kubernetes, it’s what converts a data lakehouse into an easy button for enterprises of all sizes.
  • Scale-out Lakehouse Compute: Purpose-built Dell Compute and Networking hardware perfectly matched for compute-intensive data lakehouse workloads come pre-integrated into the solution. Independently scale from storage by seamlessly adding more compute as needs grow.
  • Scale-out Object Storage: Dell ECS, ObjectScale and PowerScale deliver cyber-secure, multi-protocol, resilient and scale-out storage for storing and processing massive amounts of data. Native support for Delta Lake and Iceberg ensures read / write consistency within and across sites for handling concurrent, atomic transactions.
  • Dell Services: Accelerate AI outcomes with help at every stage from trusted experts. Align a winning strategy, validate data sets, quickly implement your data platform and maintain secure, optimized operations.
    • ProSupport: Comprehensive, enterprise-class support on the entire Dell Data Lakehouse stack from hardware to software delivered by highly trained experts around the clock and around the globe.
    • ProDeploy: Expert delivery and configuration assure that you are getting the most from the Dell Data Lakehouse on day one. With 35 years of experience building best-in-class deployment practices and tools, backed by elite professionals, we can deploy 3x faster1 than in-house administrators.
    • Advisory Services Subscription for Data Analytics Engine: Receive a pro-active, dedicated expert to maximize value of your Dell Data Analytics Engine environment, guiding your team through design and rollout of new use cases to optimize and scale your environment. 
    • Accelerator Services for Dell Data Lakehouse: Fast track ROI with guided implementation of the Dell Data Lakehouse platform to accelerate AI and data analytics.

Learn More

With the combination of these capabilities, Dell continues to innovate alongside our customers to help them exceed their goals in the face of data challenges. We aim to allow our customers to take advantage of the revolution brewing that is AI and this rapid change in the market to harness the power of their data and gain a competitive advantage and drive innovation. Enterprises are racing towards a modern data architecture – it's critical they don’t get stuck at the starting line. 

For detailed information on this exciting product, refer to our technical guide. For other information, visit Dell.com/datamanagement.

 

Source
1 Based on a May 2023 Principled Technologies study “Using Dell ProDeploy Plus Infrastructure can improve deployment times for Dell Technology” 

Read Full Blog
  • AI
  • big data
  • PowerEdge
  • NVMe
  • AMD
  • TPCx-AI

Dell Reinforces its TPCx-AI Benchmark Leadership using the 16G PowerEdge R6625 Hardware Platform at SF1000

Nicholas Wakou Nicholas Wakou

Wed, 12 Jul 2023 18:52:17 -0000

|

Read Time: 0 minutes

Overview

On 06-13-2023, Dell Technologies published a TPCx-AI SF1000 result that was based on an 11 x Dell PowerEdge R6625 hardware platform powered by AMD Genoa processors. As of the publication date, Dell results held number one slots on the Top Performance and Price/Performance tables for TPCx-AI on SF3, SF100, SF300, and SF1000. These results reinforce Dell Technologies’ TPCx-AI benchmark leadership position; a statement to the great performance provided by its AI, ML, and DL solutions.

This blog presents the hardware platform that was tested, what was measured and what the results mean.

What TPCx-AI tests measure

TPCx-AI measures the end-to-end machine learning or data science platform using a diverse representative dataset scaling from 1 GB to 10 TB. The TPCx-AI benchmark assesses various aspects of AI training and inference performance, including data generation, model training, serving, scoring, and system scalability. The benchmark can be used across a wide range of different systems from edge to data center. It aims to provide a standardized and objective measure of AI performance across different platforms and configurations.

By using TPCx-AI, organizations and vendors can make informed decisions about the AI infrastructure that best suits their needs. The benchmark helps in understanding the system's capability to handle large-scale AI training workloads and can help optimize performance and resource allocation for AI tasks.

The TPCx-AI standard defines 10 use cases based on data science pipelines modeled on a retail business data center to evaluate the performance of artificial intelligence systems. The workload trains deep neural networks on large datasets using prominent machine learning frameworks such as TensorFlow. The benchmark measures:

  1. The total time taken to train a model for each use case to a specific level of accuracy
  2. The time taken for that model to be used for inference or serving

The blog, Interpreting the results of the TPCx-AI Benchmark, outlines the ten use cases, their data science models, and the benchmark phases.

 System under test (SUT)

 

Figure 1 System Under Test (SUT).

Software versions

 Table 1 Software versions

Software

Version

Cloudera Data Platform (CDP)

7.1.7 SP2

Hadoop

3.1.1

HDFS

3.1.1

YARN

3.1.1

MR2

3.1.1

Spark

2.4.7

ZooKeeper

3.5.5

Java

1.8.0

Python

3.7.16

Red Hat Enterprise Linux

8.7 (Master node)
8.6 (Worker nodes)

TPCx-AI Kit

1.0.2

 

The result

Primary metrics

 Table 2 Primary metric scores

Primary Metric

Score

Performance (AIUCpm@1000)

3,258.01 

Price/Performance (USD/AIUCpm@100)

267.96

Availability

June 13, 2023

 The three primary metrics in Table 2 are required for all TPC results. The top ten results, based on performance or price/performance at a particular SF category, are displayed in the tables of the respective benchmark standard categorized by the metric and SF. To compare any results, all three metrics must be disclosed in the body of the message. The TPC does not allow comparing TPCx-AI results from different SF categories. The blog, Interpreting the results of the TPCx-AI Benchmarkgoes into the details of how the performance and price/performance metrics are calculated. The availability date is the date all the priced line items (SKUs) are available to customers and must be within 185 days of the submission date. For the performance metric, the higher the score the better. For price/performance, the lower the better.

Other metrics

 Table 3 Other metrics

Metric

Score

Total system cost

$872,988

Framework

Cloudera SEL Data Platform Private Cloud Base Edition

Operating system

Red Hat Enterprise Linux 8.6/8.7

Scale factor

1,000

Physical storage divided by scale factor

214.56

Scale factor divided by physical memory

0.12

Main data redundancy mode

Replication 3, RAID 1

Number of servers

11

Total processors, cores, and threads

22/704/1,344

Number of streams

4

 The metrics in Table 3 are required to be reported and disclosed in the Full Disclosure Report (FDR) and Executive Summary (ES). Except for the total system cost, these other metrics are not used in the calculation of the primary metrics but provide additional information about the system that was tested. For instance, the total system cost is the total cost of ownership (TCO) for one year. The redundancy modes provide the data protection mechanisms that were used in the configuration as required by the benchmark standard. The number of streams refers to the number of concurrent serving tests during the Throughput phase.

Numerical quantities

Benchmark run times

 Table 4 Benchmark run times

Benchmark run

Time

Benchmark start

06-07-2023 9:35:25 PM

Benchmark end

06-08-2023 3:20:10 AM

Benchmark duration

                   5:44:45.193

Benchmark phase times

 Table 5 Benchmark phase metrics

Benchmark phase

Metric_name

Metric value

Data Generation

DATAGEN

2419.613

Data Loading

TLOAD

927.45

Load Test

TLD

927.45

Power Training 

TPTT

492.143

Power Serving 1

TPST1

56.998

Power Serving 2

TPST2

57.357

Power Serving

TPST

57.357

Throughput

TTT

43.934

AIUCpm@1000.0


3258.066

 The seven benchmark phases and their metrics are explained in Interpreting the results of the TPCx-AI Benchmark, and are performed sequentially from data generation to throughput tests. In power training, models are generated and trained for each use case sequentially from UC1 to UC10. In power serving, the models obtained during the training phase are used to conduct the serving phase sequentially, one use case at a time. There are two power serving tests. The test that registers the longer time provides the TPST metric. The throughput phase runs multiple streams of serving tests concurrently. The more the number of streams, the more the system resources are taxed. Typically, the number of streams are increased until TTTn+1 > TTTn (where n+1 refers to the next throughput test). The duration of the longest running stream (TTPUT) is used to calculate the throughput test metric TTT.

Use case times and accuracy

 Table 6 Use case times and accuracy

Use case

TRAINING

SERVING_1

SERVING_2

Throughput
  (avg.)

Accuracy

Threshold

1

523.703

51.215

49.736

56.083

-1.00000

-1.0 >= -1

2

1813.764

85.354

88.783

129.274

0.43830

word_error rate <= 0.5

3

95.795

12.443

12.811

13.84

4.57451

mean_squared_log_error <= 5.4

4

59.08

25.475

25.489

31.016

0.71189

f1_score >= 0.65

5

943.023

76.289

78.351

91.615

0.03347

mean_squared_log_error <= 5.4 <= 0.5

6

435.865

33.135

33.071

37.12

0.21355

matthews_corrcoef  >= 0.19

7

43.585

15.317

15.3

17.143

1.65306

median_absolute_error  <= 1.8

8

1940.283

338.579

341.811

372.418

0.74996

accuracy_score >= 0.65

9

5448.735

703.291

699.631

745.458

1.00000

accuracy_score >= 0.9

10

818.635

28.326

28.19

31.162

0.81691

accuracy_score  >= 0.7

 

Table 6 shows the use case run times (in seconds) for each benchmark phase and the accuracy of the model that was used. For instance, the RNN model that was generated and trained for UC2 had a word_error rate of 0.4383 which was less (better) than the threshold error_rate of 0.5. The XGBoost model trained for UC8 was 74.99% accurate which was above and better than the 65% minimum accuracy threshold requirement.

Figure 2 Use case time by benchmark phase

TPCx-AI SF1000 results tables

Table 7 displays the top TPCx-AI SF1000 tables as of the publication of this blog.

Table 7 SF1000 top performance table

Table 8 Top price/performance table

Table 7 and Table 8 are similar. Of the four published results at SF1000, Dell Technologies’ hardware platforms hold the number 1, number 2, and number 3 positions on both the performance and price/performance tables. The main difference between the three top results is the processor generations:

  • The number 1 result used 4th generation AMD Genoa processors
  • The number 2 result used 3rd generation Intel Ice Lake processors
  • The number 3 result used 2nd generation Intel Cascade Lake processors

Key takeaways

  1. Dell dominates TPCx-AI top performance and price/performance tables at SF3, SF100, SF300, and SF1000.
  2. TPCx-AI performance improved greatly on newer generation Dell hardware platforms that have newer generation processors:
    1. There was a 60.71% performance improvement between hardware platforms powered by (14G) 2nd generation and (15G) 3rd generation processors.
    2. There was a 37.13% improvement between 3rd generation and (16G) 4th generation processors.
  3. TPCx-AI price/performance improved greatly between processor generations of the Dell 14G, 15G, and 16G hardware platforms:
    1. There was a 14.80% price/performance drop from hardware platforms powered by 2nd generation to 3rd generation processors.
    2. There was a 27.08% price/performance drop from 3rd generation to 4th generation processors.
  4. The form factor of the hardware platforms has reduced:
    1. The Dell 14G TPCx-AI SF1000 result used 2U servers
    2. The 15G and 16G results used 1U servers and scored better performance and price/performance
  5. Using NVMe data storage scored better price/performance metrics:
    1. The 14G result used hard drives
    2. The 15G and 16G results used more expensive NVMe data drives, and yet scored better price/performance metrics

Conclusion

This blog examined in detail the TPCx-AI performance result of the Dell 16G PE R6625 hardware platform. The result cemented Dell Technologies’ leadership positions on TPCx-AI performance and price/performance tables at SF1000, in addition to the leadership positions at SF3, SF100, and SF300. These results prove Dell Technologies’ leadership as a provider of high-performance AI, ML, and DL solutions based on verifiable performance data backed by a reputable, industry-standards performance consortium.

References

Nicholas Wakou, Nirmala Sundararajan; Interpreting the results of the TPCx-AI Benchmark; infohub.delltechnologies.com (February 2023).

Read Full Blog
  • AI
  • data analytics
  • ECS
  • Databricks

Dell and Databricks Announce a Multicloud Analytics and AI Solution

Greg Findlen Greg Findlen

Mon, 22 May 2023 16:58:09 -0000

|

Read Time: 0 minutes

Dell and Databricks' partnership will bring customers cloud-based analytics and AI using Databricks with data stored in Dell Object Storage.

The biggest business opportunity for enterprises today lies in harnessing data for business insight and gaining a competitive edge. At the same time, the data landscape is more distributed and fragmented than ever. Data is spread out across multiple environments including on-premises and multiple public clouds, thus complicating the ability to access and process data efficiently.

Enterprises require solutions that enable a multicloud data strategy by design. That means leveraging data wherever it is stored, across clouds, with a consistent management, security, and governance experience to build analytical and AI/ML-based workloads. 

At Dell, the business of data is not new to us. We store and process a large majority of the world’s data on our systems. And we work with customers across the globe every day to accelerate time to value from their data. Dell is building an open ecosystem of partners who together can help address next-gen challenges in data management.

 Dell and Databricks

 

 

 

Today, during the opening keynote at Dell Technologies World 2023, Dell Technologies announced a strategic and multi-phase partnership with Databricks

I want to share some additional details around that announcement.

Customers today can leverage native Databricks capabilities to process, analyze and share data stored in Dell object storage, located on-prem or in a cloud-adjacent datacenter like Faction, without moving data into the cloud. This unlocks phenomenal benefits for customers including compute on-demand to process on-premise data assets, the ability to securely share data within and outside the enterprise, reduced data movements and copies, compliance with data localization regulations, multicloud resiliency, the adoption of open architecture standards and an overall reduction in cost and complexity of their data landscape. Key to this integration is support for Delta Sharing, an open standard for secure sharing of data assets to securely share live data with any computing platform.

And that’s not all - our teams at Dell and Databricks are  excited to engineer a deeper integration that will deliver a truly seamless experience of using Dell object storage within the Databricks Lakehouse Platform. This will completely transform the way customers manage on-premises data with cloud platforms.

Dell and Databricks realize it is a multi-faceted world, and customers benefit from being able to access data wherever it resides to unlock competitive differentiation. Dell and Databricks will closely partner in the market to bring these solutions to our joint customers.

“Databricks is focused on helping businesses extract the most valuable insights from their data, wherever it resides,” said Adam Conway, senior vice president, product at Databricks. “This partnership provides the ability to leverage cloud and on-premises data together with best-of-breed technologies, and to securely share that data through Delta Sharing. Combining the best of Dell and Databricks changes the data landscape for customers as they operate in today’s multicloud world.”

Data Management with Dell Technologies 

This partnership is a great addition to Dell’s open ecosystem of technologies in the data space. Together with Dell’s market-leading portfolio of storage, compute and services, this ecosystem aims to provide the best-in-class data management solutions to our customers and help them build a multicloud data strategy by design. The data space is buzzing with new innovations and technologies and aimed at improving the user experience, productivity, and business value by orders of magnitude. Partner with Dell Technologies to create the right multicloud data strategy for your enterprise and unleash the next wave of transformation and competitive edge for your business. Visit us at dell.com/datamanagement to stay tuned to the latest in this space. 

To learn more about how Dell and Databricks can help your organization streamline its data strategy, read the "Power Multicloud Data Analytics and AI using Dell Object Storage and Databricks" white paper, or contact the Dell Technologies data management team at Dell.data.management@dell.com.


Read Full Blog
  • NVIDIA
  • big data
  • PowerEdge
  • Spark
  • Kafka
  • anomaly detection

Inverse Design Meets Big Data: A Spark-Based Solution for Real-Time Anomaly Detection

Raja Neogi Thomas Chan Raja Neogi Thomas Chan

Wed, 17 Jan 2024 18:27:32 -0000

|

Read Time: 0 minutes

Inverse design is a process in which you start with a wanted outcome and performance goals. It works backward to find the system configuration and design parameters to achieve goals instead of the more traditional forward design, in which known parameters shape the design. 

For accurate and timely identification of anomalies in big data streams from servers, it is important to configure an optimal combination of technologies. We first pick the autoencoder technique that shapes the multivariate analytics, then configure Kafka-Spark-Delta integration for dataflow, and finally select the data grouping at the source for the analytics to fire.  

 The iDRAC module in Dell PowerEdge servers gathers critical sideband data in its sensor bank. This data can be programmed for real-time streaming, but not every signal (data-chain) is relevant to online models that consume them. For example, if the goal is to find servers in the data center that are overheating, the internal residual battery charge in servers is not useful. Composable iDRAC data from PowerEdge servers is pooled in networked PowerScale storage. The most recent chunks of data are loaded onto memory for anomaly detection over random samples. Computed temporal fluctuations in anomaly strength complete the information synthesis from raw data. This disaggregated journey from logically grouped raw data to information using the Dell Data Lakehouse (DLH) network infrastructure specification (not shown here) triggers action in real-time. The following figure captures the architecture:     

The figure shows the PowerEdge server and Kafka integration on the left side with the Ingress label. Arrows point right to representation of Elasticsearch, data services, and PowerScale. An arrow points to the far right to a representation of anomoly detection with the egress label.

 Figure 1. End-to-end architecture for streaming analytics

 The pipeline has two in-order stages─ingress and egress. In the ingress stage, target model (for example, overheating) features influence data enablement, capture frequency, and streamer parameterization. Server iDRACs [1] write to the Kafka Pump (KP), which interprets native semantics for consumption by the multithreaded Kafka Consumer, as shown in the following figure:

The firgure shows a representation of Kafka pump with an arrow pointing to an OpenShift representation. The bottom of the figure shows an arrow pointing to the left to HDFS.

 Figure 2. Kafka to Delta

The reader thread collects data from the configured input buffer while the writer thread periodically flushes this data by concatenating to the HDFS storage in Delta format, using Spark services for in-memory computing, scalability, and fault tolerance. Storage and data-management reliability, scalability, efficiency of HDFS and Delta Lake, coupled with the Spark and Kafka performance considerations influenced our choices.

 In the egress stage of the pipeline, we apply anomaly strength analytics to the pretrained autoencoder [2] model. The use of NVIDIA A100 GPUs accelerated autoencoder training. Elasticsearch helped sift through random samples of the most recent server data bundle for anomaly identification. Aggregated Z-score error deviations over these samples helped characterize the precise multivariate anomaly strength (as shown in the following figure), extrapolation of which over a temporal window captured undesirable fluctuations. The figure shows a representation of anomaly analytics.

Figure 3. Anomaly analytics

We used Matplotlib to render, but you can alternatively manufacture on-demand events to drive corrections in the substrate. If generalized, this approach can continuously identify machine anomalies.

Conclusion

In this PoC, we combined several emerging technologies. We used Kafka for real-time data ingestion with Spark for reliable high-performance processing, HDFS with Delta Lake for storage, and advanced analytics for anomaly detection. We designed a Spark solution for real-time anomaly detection. By using autoencoders, supplemented with a strategy to quantify anomaly strength without requiring periodic drift compensation, we proved that modern data analytics integrates well on Dell DLH infrastructure. This infrastructure includes Red Hat OpenShift, Dell PowerScale storage, PowerEdge compute, and PowerSwitch network elements.

References:

[1] Telemetry Streaming with iDRAC9— Custom Reports Get Started

[2] D. Bank, N. Koenigstein, R. Giryes, “Autoencoders”, arXiv:2003.05991v2, April 2021.

Read Full Blog
  • Cassandra
  • NoSQL

Cassandra on Dell PowerEdge Servers a Match Made in Heaven

Mike King Mike King

Thu, 09 Feb 2023 20:47:00 -0000

|

Read Time: 0 minutes


Cassandra is a popular NoSQL database in a crowded field of perhaps 225+ different NoSQL databases.  Backing up a bit there is a taxonomy for NoSQL which has four types:

  • Key value as with Redis, Rocksdb & Aerospike
  • Wide Column as exemplified by Hbase and Cassandra
  • Document contains MongoDB, Couchbase and Marklogic (recently acquired by Progress)
  • Graph with TigerGraph, Neo4j, ArangoDB, AllegroGraph, and dozens of others

Cassandra is an excellent replacement for Hbase when migrating away from Hadoop to something like our Data Lakehouse solution here and here.  More in a future post on this solution.  What does wide column actually mean?  It’s simple a key-value pair w/ an amorphous, typically large payload (value).  One of the cool things I learned when designing my first Hbase db about nine years back was that the payload can vary from record to record which blew my mind at the time.  All I could think of was garbage data, low quality data, no schema, …. What a mess.  But for some strange reason folks don’t seem to care much about those items and are more concerned w/ handling growth, scale-out and performance.

Cassandra comes in two versions.  The first is community and the second is DataStax edition, DSE.  DataStax offers support for both and has excellent services capability after their purchase of Last Pickle.  From my experience in my customer base I see about 50% of each.  I think DSE is well worth the cost for most customers but then again that’s a choice and the voices against paying for it seem to be stronger.

Cassandra clusters should have a number of nodes evenly divisible by three.  I like to start with six myself.  As for storage one can probably get by with vSAS RI SSDs.  More smaller capacity SSDs will give you more IOPS.  10GbE NICs should suffice but I favor 25GbE these days due to economics, value and future proofing.  One can get 150% more throughput for about a 25% uplift.  Sorry Cisco but 40GbE is dead and will go the way of the dodo bird.  The cores you need can vary but tend to be in the 12-16core per socket range.  Most of the time I’m looking for value here.  I avoid top end processors due to cost and generally they’re not needed.  If I need lots of cores I would look at some of our AMD servers.  For this exercise we will consider Intel as it’s way more prevalent.  For us at Dell this means and R650 Ice Lake server where we can squeeze a lot in 1U.

The specs for a six node cluster could look like this per node:

  • 256GB of RAM with 16 x 16GB DIMMs in a fully balanced config.
  • Dual 16c processors w/ a bit faster clock speed.  So the 6346 would fit the bill @3.1GHz
  • Dual 25GbE NICs
  • HBA355E – This assumes no RAID for your db
    1. If you plan on using RAID for your Cassandra db then select the H755 PERC which has 8GB of cache.
  • 6 x 960GB vSAS RI SSDs
    1. 99% of the time read intensive drives will suffice
    2. If your retention is one day or less than mixed use would be in order, but I’ve not seen that
  • M.2 BOSS 480GB RI SSD pair – fully hot swapable RAID1 pair
    1. Here’s where your OS and possibly the DSE or Apache Cassandra software would go

For your Cassandra needs contact me @ Mike.King2@dell.com to discuss your challenge further.

Read Full Blog
  • Kafka

Kafka on Dell Power Edge Servers – a Winning Combination

Mike King Mike King

Mon, 06 Feb 2023 19:07:45 -0000

|

Read Time: 0 minutes


By far the most popular pub/sub messaging software is kafka.  Producers send data and messages to a broker for later use by consumers.  Data is published to one or more topics which are queues.  Consumers read messages from a topic and mark it as read.  Most topics may have multiple consumers.  Topics may be partitioned to enable parallel processing by brokers.  Once all consumers have read the message it is logically deleted.  Replicas create another copy of your data to help prevent data loss.

Regarding your platform choice there are many options including:

  • Bare metal servers with DAS
  • Virtualized
  • HCI
  • K8S

Some tips:

  • Keep your cluster clean.  Don’t use kafka to retain data or replay data past a few days or a week.  Once data is consumed let it be deleted.  
  • Use an odd number of nodes w/ a minimum of three or five depending on your tolerance for failures.  Most environments will have many more nodes.
  • The storage should be local and SSDs are highly recommended.
  • No RAID should be needed if replicas are in effect.
  • Use random partitioning
  • One replica is likely a minimum viable cfg w/ two replicas or three copies being most common in production.

What might this look like on some PE Servers.  For 15G Ice Lake servers the most attractive server would be an R650.  It’s a 1U server with 10 drive bays, decent memory and a wide selection of processors.  A middle of the road configuration might look something like the following:

  • Seven R650 servers
    1. 256GB of RAM with 16 x 16GB DIMMs in a fully balanced config.
    2. Dual 16c processors w/ a bit faster clock speed.  So the 6346 would fit the bill @3.1GHz
    3. Dual 25GbE NICs
    4. HBA355E – This assumes no RAID for your data drives
      • If you plan on using RAID for your kafka data then select the H755 PERC which has 8GB of cache.
    5. 6 x 1.92TB vSAS RI SSDs
      • 99% of the time read intensive drives will suffice
      • If your retention is one day or less than mixed use would be in order, but I’ve not seen that
    6. M.2 BOSS 480GB RI SSD pair – fully hot swappable RAID1 pair
      • Here’s where your OS and possibly the kafka Confluent software would go

For your kafka needs feel free to contact me @ Mike.King2@dell.com to discuss your challenge further.

Read Full Blog
  • big data
  • Kubernetes
  • private cloud
  • cloud native

An Ace in the Hole for Your Kubernetes Clusters

Mike King Mike King

Mon, 24 Apr 2023 14:12:49 -0000

|

Read Time: 0 minutes

Robin Systems SymWorld Cloud, previously known as Cloud Native Platform (CNP), is a killer upstream K8S distribution that you should consider for many of your workloads.  I’ve been working with this platform for several years and continue to be impressed with what it can do.  Some of the things that I see value in for Symworld Cloud include but are not limited to the following:

  • QoS for resources such as CPU, memory and network.
  • Templates for workloads that are extensible
  • Pre and Post scripting capability that is customizable for each workload
  • Automation of all tasks
  • Elastic scaling both up and down
  • Multi-tenant capabilities
  • Easy provisioning
  • Simple deployment
  • An application store that can house multiple versions of a workload with varied functionality
  • Higher resource utilization
  • Three types of nodes possible: compute, storage and converged
  • Resources may be shared or dedicated as needed

When it comes to workloads there’s an extensive existing catalog.  If you need something added it can be added.  Workloads that could be deployed include:

  • Any NoSQL db including MongoDB, Cassandra, Redis, TigerGraph, Neo4j, Aerospike, Couchbase, RocksDB, etc….
  • Select RDBMSs such as PostgreSQL, MySQL, Oracle, SQL Server, Greenplum, Vertica
  • Streaming & messaging as with Kafka, Flink, Pulsar, …
  • Elastic Search
  • Hadoop
  • Query engines like Presto, Starburst, Trino
  • Spark

With respect to use cases some options might be:

  • To glue together your Data Lakehouse.  We have a solution that combines spark, deltalake, k8s w/ Robin, PE servers and scalable NAS or object storage.  You may find out more about it here & here.  The inclusion of Robin allows one to add other workloads which could be kafka and Cassandra w/o having to create separate environments!
  • How about Big Data as a Service (BDaaS)?  So you have say six different workloads including MySQL, kafka, spark, TigerGraph, Starburst & airflow.  Putting all these in the same platform, containerized and scalable on the same set of perhaps eight PE nodes would allow one to save some serious coin.
  • Your own slice of heaven including the two to a dozenish apps that matter to you.
  • Machine learning and deep learning with some GPU enabled servers.
  • Robin Systems has some additional use cases & key info here.

Interested in learning more?  We have an upcoming event on May 19th @12N EST that is just what the doctor ordered. I’ll be a panelist for this webinar.

Topic: Solving the Challenges of deployment and management of Complex Data Analytics Pipeline 

Register in advance for this webinar:

https://symphony.rakuten.com/dell-webinar-data-analytics-pipeline

After registering, you will receive a confirmation email containing information about joining the webinar

If you just can’t wait till then feel free to reach out to me @ Mike.King2@dell.com to discuss your challenge further.

Read Full Blog

Journey into the analytics space with Dell & Starburst

Vrashank Jain Vrashank Jain

Thu, 02 Feb 2023 14:50:00 -0000

|

Read Time: 0 minutes

Data silos are a growing concern for enterprises today. They pose new challenges to discover, access, and activate data. At Dell Technologies, we have helped our customers work through these challenges for many years, from building fraud detection to enabling life-saving healthcare. We understand that getting the data strategy right can help teams solve their real-world problems. Dell Technologies has engaged in joint engineering and validation efforts to integrate our leading server product Dell PowerEdge and our leading Dell ECS  with industry leaders in the Data Analytics space.  

Today, we are happy to announce a collaboration with analytics leader Starburst Data, which will allow our analytics customers to deliver flexible and efficient architectures by combining the fastest and most secure query engine and leading hardware platforms for compute and storage.

Data virtualization and federated query analytics

Starburst is built on top of Trino, the open-source high performance distributed SQL engine, that’s known for running fast analytic queries against data sources ranging in size from GBs to PBs. Trino was formerly called PrestoSQL. In fact, in 2020, we released a white paper describing how Presto’s capabilities translate remarkably well on to the Dell ECS object storage, and that Trino’s rich feature set positions it well to win the price/performance battle against Hadoop and other technologies in most cases!

The Starburst Enterprise Platform distribution of Trino was created to help enterprises extract more value from their Trino deployments through global security with fine-grained access controls, stable and reliable releases, additional connectors, data caching, and enterprise support including guidance from the most qualified group of Trino experts anywhere.

For these reasons, we chose to partner with Starburst and deploy their software in our labs to evaluate its performance on Dell hardware. We used the industry standard TPC-DS test suite to benchmark Starburst performance by measuring the total execution item as well as the per-query execution time. We also varied the hardware resources to model how Starburst’s performance varies. We detailed our set up and experiments for reproducibility in this paper. Our goal was to provide our customers with a validated design reference for deploying Starburst and scale it appropriately as the query volume, concurrency, or data volume scales.

Deploy and scale on Dell infrastructure 

Starburst is based on a distributed Coordinator-Worker architecture. In our setup, we run coordinator and worker nodes of Starburst Enterprise on Dell PowerEdge servers and use unstructured storage such as Dell Elastic Cloud Storage (ECS) for materialized views, data products, caching, and more.

We tested the reference architecture on PowerEdge R740XD (14G), but we think the latest PowerEdge server portfolio (15G) can take performance to a new level with generational improvements such as:

  • High-performance computing - delivering up to 43% greater performance by leveraging Intel's 3rd Gen Xeon Scalable processors.
  • PCIe Gen 4 - doubling the throughput over prior server generations, with eight lanes of data.
  • Comprehensive security - with data encryption, the root of trust protection, and supply chain verification.
  • Improved energy efficiency - with the latest cooling technology, offering up to a 60% reduction in power consumption.
  • Flexible, autonomous management - delivering up to 85% time savings by freeing up the skilled hands of IT professionals for other vital projects.

We used the ECS EX500 as a data lake source. ECS is the world’s most cyber-secure object storage that delivers scalable public cloud services with the reliability and control of a private-cloud infrastructure. With comprehensive protocol support for unstructured data (object and file) and a variety of deployment options (turnkey appliance or software-defined), ECS can support a wide range of workloads especially big data analytics. And best of all, Starburst works seamlessly with ECS!

Harness data to solve real world problems

Data teams can start taking advantage of our collaboration now. Today’s announcement allows customers to:

  • Quickly deploy a thoroughly tested architecture comprising Dell hardware, Starburst Enterprise Platform, and other software on-premises
  • Effectively partner with IT to move data intelligently into a data lake / data warehouse based on usage patterns
  • Prevent vendor lock-in with support for the most popular open table and file formats
  • Separate compute and storage and scale flexibly and efficiently
  • Harness the innovations in our latest generation of ECS appliances as a data lake storage

We’re very excited about the collaboration and can’t wait for you check out the reference architecture to learn about the announcement and the solution.

Read Full Blog
  • NoSQL

You Really Do Need a Database Strategy

Mike King Mike King

Mon, 06 Feb 2023 18:42:50 -0000

|

Read Time: 0 minutes

I’m amazed at how many companies I talk to that don’t have a discernable database strategy.  Aggregate spending on database technology for software, services, servers, storage, networking & people runs six figures for most medium sized companies and into the tens of millions per annum for large companies.  So anyway you slice it it’s a large investment that warrants a strategy.

First let’s consider the different kinds of database technologies out there.  There’s relational, time series, geo-spatial, GPU, OLAP, OLTP, HTAP, New SQL, NoSQL including key-value, document, wide-column and graph.  All together there’s probably 400ish different choices.  Many large companies have 10 – 20 of these different one’s floating around.

How does one get started?

  • Firstly, take inventory.  This is not as easy as it sounds.  People often buy things via their own budgets, use open-source software that may not incur license spend, acquire software packages that have a database in them, clone software and so on.
  • Then add up how much is spent on them.  Go back three years.  This is not to imply that sunk costs matter but what we seek to do with it is understand the trend.
  • Categorize all the products starting with what is detailed above.
    1. Mainstream/Standard, Niche, Emerging, Contained are some categories that may resonate.
  • Detail usage
  • Determine overlap & redundancy.  
    1. If you have data marts on SQL Server, Greenplum, Netezza and Vertica then you probably have three more databases serving this function then you need.
    2. You run kafka all over the enterprise but’s it’s running on HCI, baremetal & virtual on Dell, HPE and Lenovo.  Simply run it on baremetal with Dell and jettison the rest.
    3. Inspect acquisitions done over the last five or so years.  This is a ripe area.

If you like a free consultation on your particular dilemna please do contact me at Mike.King2@Dell.com

Read Full Blog
  • data analytics
  • big data
  • NoSQL

Graph DB Use Cases – Put a Tiger in Your Tank

Mike King Mike King

Mon, 06 Feb 2023 18:44:06 -0000

|

Read Time: 0 minutes


In the NoSQL Database Taxonomy there are four basic categories:

  • Key Value
  • Wide Column
  • Document
  • Graph

Although Graph is arguably the smallest category by several measures it is the richest when it comes to use cases.  Here is a sampling of what I’ve seen to date:

  • Fraud detection
  • Feature store for ML/DL
  • C360 – yeah you can do that one in most any db.
  • As an overlay to an ERP application allowing the addition of new attributes without changing the underlying data model or code.  For select objects the keys (primary & alternate) with select attributes populate the graph.  The regular APIs are wrapped to check for new attributes in the graph.  If none then the call is passed thru.  For new attributes there would be a post processing module that makes sense of it and takes additional actions based on the content.
    • One could use this same technique for many homegrown applications.
  • As an integrated database for multiple disparate, hetereogenous data store integration.  I solutioned this conceptually for a large bank that had data in the likes of Snowflake, Oracle, MySQL, Hadoop and Teradata.  The key to success here is not dragging all the data into the graph but merely keys, select attributes
  • Recommendation engines
  • Configuration management
  • Network management
  • Transportation problems
  • MDM
  • Threat detection
  • Bad guy databases
  • Social networking
  • Supply chain
  • Telecom
  • Call management
  • Entity resolution

We’re closely partnered with Tiger Graph and can cover the above use cases and many more.

 If you’d like to hear more and work on solutions to your problem please do drop me an email at Mike.King2@Dell.com

Read Full Blog
  • data analytics
  • big data
  • life cycle management

What's a Data Hoarder to Do?

Mike King Mike King

Tue, 10 May 2022 19:18:45 -0000

|

Read Time: 0 minutes

So you're buried in data, your can't afford to expand, your performance is bad and getting worse and your users can't find what they need.  Yes it's a tsunami of data that's the root cause of your problems.  You ask your Mom for advice and she says "Why don't you watch that TV show called Hoarders?"  You watch a few episodes and can relate to the problem but they offer no formidable solutions for our excess data.  Then you talk to Mike King over at Dell and he says "That problem has been around since the ENIAC".  The bottom line is that almost all systems are designed to store certain kinds of data for a pre-determined amount of time (retention).  If you don't have retention rules then you failed as an architect.  The solution for data hoarding is much more recent evolving over the last 40 years or so.  It was first called data archiving.  That term is still used today by some.  The concept is really simple one takes the data that is no longer needed and removes it from the system of record.  If the data is still needed but just way less frequently then it would be move to a cheaper form of storage.  The disciple that evolved around this practice was first called data lifecycle management (DLM) and later information lifecycle management (ILM).  ILM considers many more aspects of the archiving process in a more holistic sense including policies, governance, classification, access, compliance, retention, redaction, privacy, recall, query and more.  We won't get into all the ILM stuff in this post.

Let's take a concrete example to get started.  We have a regional bank called Happy Piggy Bank.  They do business in 30 states and have supporting ERP applications like Oracle EBS, databases such as Greenplum & SingleStore for analytics and hadoop for an integrated data warehouse and AI platform.  The EBS db has six years of data and a stout 600TB of data.  The Greenplum db is around 1PB and stores just 90 days of data.  SingleStore is new but they have big plans and it's at 200TB today but will grow to 3PB in a year.  The hadoop is the largest of all and has detail transaction and account statements going back 10 years and stores 10PB of raw data.  Only the Greenplum db has a formal purge program that was actually written and put in production.  Both the hadoop and EBS environments have no purge program.  The first order of business is to determine how much data they should or need to retain.  This is mostly a business activity.  The next step is to determine the access patterns.  In order to do data archiving one needs to determine the active portion of the data.  In most systems perhaps 99% of the access is constrained to a smaller portion of the retention continuum.  Let's consider that EBS db and it's six years of data.  We might run some reports and do some analysis and it's highly likely that 90% of the data is less than 6 months old and let's say 99% is less than 1 year old.  In this case we should target the 5 oldest years of retention (83% of the data or 498TB of the db) to migrate to a more cost effective platform.  In a similar fashion we determine that 60% of the hadoop data is accessed less than 1% of the time so that's a 6PB chunk we can lop off of the hadoop system.  So for Happy Piggy Bank we have determined we can remove 6.5PB of data from two of the systems which will yield the following benefits:

  1. Room for future growth will be created in the source systems
  2. Performance should improve in these systems
  3. Overall data storage costs will go down
  4. The source systems will be easier to manage
  5. We will likely avoid increased software licensing charges for Oracle and hadoop as compared to doing nothing

So ye ask what might the solution be?  Enter Versity a partner of Dell Technologies enabled through our OEM channel.  Versity is a full featured archiving solution that enables:

  • High performance parallel archive
  • Covers a wide variety of applications, databases and such
  • Stores data in three successive tiers (local, NAS & object)
  • Supports selective recall

The infrastructure includes:

  • Versity software
  • PE 15G servers such as R750s
  • PowerVault locally attached arrays
  • PowerScale NAS appliances
  • ECS object appliances

A future post will cover more details on what this solution could look like for Happy Piggy Bank.

Versity targets customers that have 5PB of data or more that can be archived.

Read Full Blog
  • data analytics
  • big data
  • PowerEdge
  • data management

Live Optics is Your Friend

Mike King Mike King

Tue, 05 Oct 2021 19:01:57 -0000

|

Read Time: 0 minutes


It’s a rare day that a free tool exists that can help profile customer workloads to the mutual benefit of all.  Live Optics (previously DPack) is a gem in the rough that is truly a win-win proposition for customers and vendors such as Dell.  I’ve been using it for years and found that it’s a rare day that I don’t learn something of use.

The tool is similar to SAR on steroids.  Data is collected for each host.  Hosts can be VMs.  Servers can be from any manufacturer.  The data collected is on IOPS (size and amount), memory usage, CPU usage and network activity.   It can be run in local mode where the data doesn’t go anywhere else or it can be stored in a Dell private cloud.   The later is more beneficial as it may be accessed by folks in many roles for various assessments.  The data may also be mined to help Dell make better decisions of current and future products based on actual observed user profiles.

I use LiveOptics to profile database workloads like Greenplum and Vertica, Hadoop, NoSQL databases like MongoDB, Cassandra, Marklogic and more.  

Upon inspection of the workload the data collected helps facilitate more meaningful discussions with various SMEs and to right size future designs.  In one case I found a customer that was using less than half their memory during peak periods…so we suggested new server BOMs with much less memory as they didn’t need what they had.  

Can we help you with assessing your workloads of interest on our servers or those of our competitors?

Some links of interest

Read Full Blog
  • data analytics
  • big data
  • PowerEdge

How about SingleStore for your database on 15G Dell PE Servers?

Mike King Mike King

Fri, 02 Dec 2022 04:58:29 -0000

|

Read Time: 0 minutes

How about SingleStore for your database on 15G Dell PE Servers?

Singlestore is a distributed relational database that was previously called MemSQL.  It is well suited to analytics workloads.  There are two data structure constructs available.   First is the column store which is on disk.  Disk is typically SSDs.  Second is a row store that is in memory and essentially a key-value database.    Yes you can have both types in the same db and join across the two different table types.   Data for the column store is arranged in leaves where the low level detail is stored and aggregators which are summarized data structures.  Clients use the aggregators for queries via SQL.  

Singlestore uses the MySQL protocol which makes it compatible with anything that can connect to MySQL.

Customers choose this database when they have demanding high performance analytics needs.  We have many large financial customers that are very happy with it.

So what does it look like with the latest 15G IceLake servers for Dell.  

Although it could run on most any server the leading candidate would be a Dell PowerEdge R650 for db sizes up to 400TBu.  Environments that have larger db needs would use a Dell PowerEdge R750.

ROTs

  • Aggregators use single 25GbE NIC
  • Leaf nodes use a single 10GbE NIC
  • Aggregator nodes use about ¼ RAM & ¼ cores as leaf nodes

Other items

  • RAID is optional but most customers elect it.  The figures below assume RAID10.
  • Use an m.2 BOSS card w/ a R1 pair of 480GB RI SSDs for the OS and software.  They are now hot swappable.
  • As for durability & cost reasons 99.99% of the time read intensive value SAS SSDs will be the right fit.

5TB Env

  • 2 aggregators w/ 4 x 480GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
  • 4 leaf nodes     w/ 4 x 960GB RI SSD, 256GB RAM, 2 x 12c, 10GbE NIC

100TB Env

  • 3 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
  • 7 leaf nodes    w/ 8 x 3.84TB RI SSD, 512GB RAM, 2 x 24c, 10GbE NIC

400TB Env

  • 4 aggregators w/ 2 x 960GB RI SSD, 128GB RAM, 2 x 8c, 25GbE NIC
  • 14 leaf nodes    w/ 8 x 7.68TB RI SSD, 1024GB RAM, 2 x 28c, 10GbE NIC

If you need your Singlestore database on Dell PE Servers do let us know.

Read Full Blog
  • Splunk

Intelligent Data Pipelining for Splunk with Cribl LogStream

Steve Meilinger Keith Quebodeaux Steve Meilinger Keith Quebodeaux

Wed, 18 Aug 2021 09:21:04 -0000

|

Read Time: 0 minutes

Dell Technologies has been working with customers for more than five years to help reduce Splunk infrastructure total cost of ownership (TCO) and complexity. Dell Technologies, at the time EMC, was pioneering the strategy of separating compute from long-term storage, with NFS and Isilon for cold data to reduce cost and complexity in managing historic data. This concept of separating compute from storage has now been adopted within the Splunk application with the introduction of SmartStore.  

Dell Technologies was an early supporter of SmartStore with ECS being one of the first S3 platforms announced in the 2018 Splunk SmartStore launch blog. More recently Dell Technologies has worked with Intel to illustrate the value of NVMe in indexers to increase indexer density and performance. Dell Technologies will continue to drive these infrastructure innovations as containerization of Splunk Enterprise Security and IT Service Intelligence becomes generally available.

So, Dell Technologies can help customers reduce the cost and complexity of Splunk infrastructure. But how do we improve efficiencies with Splunk and the data it consumes? One way is intelligent data pipelining so customers can ensure they get data from their various sources to Splunk in the most efficient way possible and still be free to use that data elsewhere while maintaining control and flexibility.  There are several tools that can provide data pipelining, but one very interesting solution for machine-generated data is Cribl LogStream. Cribl was started by former Splunkers and LogStream provides data pipelining for logs, metrics, and traces for any log analytics platform, not just for Splunk.

How is data pipelining and Cribl LogStream potentially beneficial to my Splunk environment? We feel there are a number of benefits of data pipelining that should be considered by our customers:

  1. Reducing ingested log volumes - We’re not suggesting that you should be ingesting fewer sources (in most cases you can actually do more!).  Rather you can aggregate the data, remove duplicate fields and null values or simply format the data more efficiently. This reduction will directly impact the cost and downstream system performance.  
  2. Keep a full fidelity copy of the source data – Cribl has the ability to send the original source data to a low-cost destination such as the Dell PowerScale or ECS platforms. This leaves you in control of your source data should you need to reinvestigate the data at a later date or if you are preparing to send your data to an external service provider like Splunk Cloud or an alternative Analytics platform like Elasticsearch. This source data facilitates any re-platforming or repatriation that could potentially happen in the future.
  3. Multiple platform support – Should you be using several different Analytics platforms in your environment, such as Elasticsearch for observability, Splunk for security and Cloudera for fraud-detection, Cribl has the ability to route the necessary data to the appropriate tool or even all the tools!
  4. Data Masking and Hashing – In environments where there’s highly sensitive data that Splunk is ingesting, such as customer, patient, or account data, Cribl gives you the ability to hash, obfuscate, or even eliminate the sensitive data in flight before ingesting into Splunk or another long-term platform.      

What are the deployment options for Cribl LogStream on Dell Technologies? Cribl can be deployed on any architecture that supports Linux or containers. This could be Dell PowerEdge, PowerFlex, VxRail, or a combination of Dell PowerEdge and shared storage, like Dell PowerStore. The compute requirements for LogStream are relatively minimal when compared to Splunk Indexers. Compute ratios are estimated by Cribl at one physical core and 2 GB of RAM being capable of pipelining 400 GB of data per day. The storage requirements are likely to be disproportionately higher if the intent is to keep full-fidelity copies of data intact. Like compute, the options for storage are open, however, at scale Dell Technologies ECS or PowerScale are likely the most effective choices. Depending on your retention requirement, PowerStore may be a great choice given its amazing deduplication capabilities.

If you are a Splunk customer and interested in exploring how Cribl could be a benefit to you, please feel free to reach out to your Dell Technologies account team. Dell Technologies is both a partner and a customer at scale on Splunk and multiple other data platforms. We’ve had the honor and privilege to speak and be recognized at Splunk Conf and, as I hope the blog suggests, we have a complete portfolio of compute and storage, in addition to our partner ecosystem, to help customers on their Splunk journey.

Read Full Blog
  • data analytics
  • big data
  • Spark

Distributed Data Analytics Made Easy with Omnia

Treasure Eiland Treasure Eiland

Thu, 20 Jul 2023 17:03:25 -0000

|

Read Time: 0 minutes

The Challenge

Cleaning up a few fields in a data file or replacing a few free-form fields with a standardized format. Running a few basic statistics on the responses of your user survey or creating a regression model based on the data from a few sensors. These types of operations are commonplace in the data analytics space. Grab your laptop, request a VM from your IT team or a cloud instance from your favorite service provider, and you’re ready to go!

Maybe at first. But data is growing exponentially, and eventually, one computer simply doesn’t provide enough raw performance to get the job done. That’s when it’s time for distributed data analytics, where you process different chunks of data on different computers and bring all the results together at the end. And one of the most common tools for doing distributed data analytics is Apache Spark.

Apache Spark

Apache Spark (https://spark.apache.org/) was first created in 2009 and is primarily used for analyzing batch and streaming data. It is a good option when there are large data processing tasks that need to be completed in the most efficient way possible. Spark is compatible with common data science/engineering languages such as Python, R, Scala, SQL, and Java. The Spark analytical engine can be installed on Kubernetes or Mesos. Spark can also be used for graph processing through tools such as GraphX and machine learning through MLlib. 

Omnia makes Spark on Kubernetes easy

Omnia is an open-source, community-driven project with the goal of deploying clusters optimized for workloads that users need to run. While Omnia was created within the Dell Technologies HPC Community, it can be used by anyone, anywhere. Omnia allows its customers to deploy one platform for all their needs, or to easily deploy many platforms if needed. Omnia’s collection of automatically deployed capabilities is always expanding to fit the needs of the community. Instead of having to use multiple platforms and numerous servers, Omnia solves that problem and allows deployment to be done at the push of a button. And that ability now extends to deploying Spark on Omnia-deployed Kubernetes clusters.

Now, any IT team deploying clusters with Omnia will get Spark enabled automatically. No extra configuration, no additional work. Omnia deploys the spark operator as a standalone deployment, which can be leveraged from within an Omnia-deployed JupyterHub instance or an Omnia-deployed KubeFlow instance. It’s that easy!

Learn More

Learn more about Omnia

Learn more about Spark

Read Full Blog
  • PowerEdge
  • VMware

Dell EMC PowerEdge R750 Virtualized Workload Performance measurement using VMmark 3.1.1

Mahmoud Ahmadian Mahmoud Ahmadian

Tue, 03 Aug 2021 14:21:24 -0000

|

Read Time: 0 minutes

This blog is about virtualization benchmarks that we ran at the Dell EMC labs in June 2021. The specific example showcased here is the performance of Dell PowerEdge R750 with Intel Xeon Platinum 8380 processors running virtualized workloads as prescribed by the VMmark 3.1.1 benchmark. At a high level, the configuration tested was made up of the following components:

 

Server Model: Dell EMC PowerEdge R750
Storage Model: Dell EMC PowerMax 8000
Hypervisor: VMware ESXi 7.0 U2 Build 17630552
 Datacenter Management Software: VMware vCenter Server 7.0 U1c Build 17327586

 

  • Systems Under Test  (SUTs) hardware configuration 

The hardware configuration for the Dell EMC PowerEdge R750 systems under test is shown in Table 1.

Table 1 SUTs hardware configuration

Number of Servers
 (information in this section are per server)

2

Server Manufacturer and Model

Dell EMC PowerEdge R750

Processor Vendor and Model

Intel Xeon Platinum 8380

Processor Speed/ Turbo Boost Speed

2.3 GHz / 3.4 GHz

Total Sockets/Total Cores/Total Threads

2 Sockets / 80 Cores / 160 Threads

BIOS Version

1.1.3

Memory Size 

2048 GB, 32 DIMMs

Memory Type and Speed

64 GB 2Rx4 DDR4 3200 MHz RDIMM

Disk Subsystem Type

FC SAN

Number of Disk Controllers

1

Disk Controller Vendors and Models

Dell PERC H745

Number of Physical Disks for Hypervisor

2

Disk Vendors, Models, Capacities, and Speeds

WDC, WUSTR1596ASS200, 960 GB, 12 Gbps SSD

Number of Host Bus Adapters

1

Host Bus Adapter Vendors and Models

QLE2692 Dual Port 16 Gb FC Adapter

Number of Network Controllers

2

Network Controller Vendors and Models

Broadcom Gigabit Ethernet BCM5720
 Mellanox ConnectX-5 EN 25 GbE Dual-port SFP28 Adapter

 

  • Storage hardware configuration

 The workload VMs were  hosted by the SUTs and resided on attached shared SAN storage.

In this configuration a Dell PowerMax 8000 was used as shown in Table 2.

 

Table 2 Storage hardware configuration

Storage Type

FC SAN

Storage platform

Dell EMC PowerMax 8000

 

  • Networking hardware configuration

The networking switches used for the test bed are shown in Table 3.

Table 3 Networking hardware configuration

Network Switch 

1 x Dell PowerConnect 6248, 1 x Dell Force10 S4810

Network Speed

1 Gbps for SUT management, 10 Gbps for vMotion, Clients and VMs

 

 

  • About VMmark

VMmark is VMware’s virtualization workload performance benchmark. At a high level, the latest version, VMmark 3.1.1, is a multi-server, virtualized datacenter benchmark developed to measure the performance and scalability of virtualization platforms. It performs  infrastructure operations such as dynamic virtual machine relocation, dynamic virtual machine storage relocation, and virtual machine provisioning while running applications on workload VMs. 

The performance of those individual workloads is aggregated to represent the overall performance of the virtualized data center, while meeting minimum quality of service (QoS)  requirements.

 

  • Dell EMC PowerEdge R750 VMmark 3.1.1 result

The published result was run on two PowerEdge R750 servers configured as a Distributed Resource Scheduler (DRS) cluster. The workload deployed was made up of 14 tiles. Each tile was  a grouping of 19 workload VMs. Some of the workload VMs simulated scalable web applications. Examples of such applications include  social networking and online auction web sites. Other workload tile VMs simulated E-Commerce. The 14 tiles added up to a total of 266 workload VMs, with each PowerEdge R750 hosting 133 workload VMs. 

 

  • Summary of results 

The Dell EMC PowerEdge R750 cluster achieved a score of 13.95 with 14 tiles. The results met all QoS  thresholds and was in complete compliance with VMmark 3.1.1 run rules. 

The submitted test results included the following infrastructure operations and associated statistics are shown in Table 4. 

Table 4 Test results

Infrastructure_Operations_Scores

vMotion

SVMotion

XVMotion

Deploy

Completed_Ops_PerHour

28.00

27.00

22.00

11.50

Avg_Seconds_To_Complete

7.90

75.82

96.02

279.14

Failures

0.00

0.00

0.00

0.00

 

For more information  about VMmark, see: 

https://www.vmware.com/products/vmmark.html

 

For the full official publication of the Dell VMmark benchmark result, see:

https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/vmmark/2021-06-29-DellEMC-PowerEdge-R750.pdf

 

Read Full Blog
  • data analytics
  • Elastic Stack

Elastic 7.12 Frozen Data and Dell Technologies ECS Enterprise Object Storage

Keith Quebodeaux Greg Galvan Steve Meilinger Mark Thomas Keith Quebodeaux Greg Galvan Steve Meilinger Mark Thomas

Tue, 22 Jun 2021 12:28:53 -0000

|

Read Time: 0 minutes

Many of us who work with Elastic are excited by the announcement of Elasticsearch 7.12 and the introduction of leveraging S3 for searchable frozen data in Elasticsearch Index Lifecycle Management (ILM).  Dell Technologies’ customers were already able to take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots. Now with the introduction of frozen data to S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS.

Elasticsearch is part of the Elastic Stack, also known as the “ELK Stack”, a widely used collection of software products based on open source that is used for search, analysis, and visualization of data by Elastic.co.   The Elastic Stack is useful for a wide range of applications, including observability, security, and general-purpose enterprise search.  Dell Technologies is an Elastic Technology Partner, OEM Partner, and Elastic customer.   Dell Technologies uses the Elastic Stack internally for several use cases, including observability of Kubernetes and text document search. 

Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics.  Dell ECS delivers rich S3-compatibility on a globally distributed architecture, empowering organizations to support enterprise workloads such as cloud-native, archive, IoT, AI, and big data analytics applications at scale.  Dell ECS is being used by many customers as a globally distributed, object storage platform for machine data and analytics.

In July 2019 Dell Technologies published “Dell EMC ECS: Backing Up Elasticsearch Snapshot Data”.  That document illustrates how to configure Elasticsearch to use the backup and restore API to store data in ECS.  Snapshots in Elasticsearch are the only reliable and supported method to back up an Elasticsearch cluster.  There are no Elastic-supported methods to restore data from a file system backup.  You can take snapshots of an entire cluster, or only specific indices in the cluster.  In addition to object storage on Dell ECS, Elasticsearch can be backed up to other shared file system such as Isilon or PowerScale.  Backing up the data to Dell EMC storage allows customers to have peace of mind that their Elasticsearch data is protected.  With Elasticsearch 7.12 and Cold and Frozen data, those snapshots take on even greater significance.

Elasticsearch 7.12 and Frozen Data

The frozen tier in Elasticsearch was introduced recently in Elastic 7.12.  Index Lifecycle Management with hot, warm, and cold tiers in addition to the capability to search snapshots was already available in previous versions of Elasticsearch.  The addition of the frozen tier allows object stores like Dell ECS to be fully searchable.   The addition of the frozen tier in Elasticsearch decouples compute from long-term storage. This feature will help customers reduce costs and resources for historic data while maintaining or expanding the accessibility and value of historic data.  Dell Technologies has numerous customers, especially in regulated industries such as healthcare or financial services, who keep or want to keep machine data anywhere from one to seven years to facilitate security investigations, trend analysis, predictive analytics, or audit and regulatory compliance.   For many this can be cost-prohibitive, leading customers to choose to delete valuable data or store it in a format that is not easily accessible.  Elastic has released the repository test kit to test and validate any S3-compatible object store to work with searchable snapshots and the frozen tier.

Dell Technologies Elasticsearch 7.12 Architecture with ILM and Frozen Data

So how might deployments of Elasticsearch with full data life cycle management look with the Dell Technologies portfolio?   Elastic data life cycle management should leverage higher performance block storage for hot and warm data, hot on high speed and warm on lower cost and performance.  This could be NVMe or lower density SSD on Dell PowerEdge, VxRail, PowerFlex or PowerStore for hot and higher density SSD or HDD for warm.  In 2020, Dell Technologies validated the Elastic Stack running on our VxFlex family of HCI with both VMware and ECK. Because Elasticsearch tiers data on independent data nodes compared to  multiple mount points on a single data node or indexer, the multiple types and classes of software defined storage that is presented to independent HCI clusters can be easily leveraged between Elasticsearch clusters to address data temperatures.

Once data is moved to the cold tier Elastic will single-instance your data if you have enabled replica shards.  This allows the storing of up to twice the amount of data on the same amount of hardware over the warm tier by eliminating the need to store redundant copies locally. However, this also increases the value of snapshots as the indices in the cold tier are backed up to your object store for redundancy.  As mentioned previously, snapshots would be to Dell Technologies ECS.

With the introduction of the frozen tier, Elasticsearch removes the need to store data on locally accessible block storage and uses searchable snapshots to directly search data stored in the object store without the need to rehydrate.   As data migrates from warm or cold to frozen based on your ILM policy, indices on local nodes are migrated to your object store.  A local cache, typically sized to 10 percent of your frozen data, stores recently queried data for optimal performance on repeat searches.  This greatly reduces storage costs for large volumes of data.

Elasticsearch data nodes tend to have average allocations of 8 to 16 cores and 32 to 64 GB of RAM.  With the current ability to support up to 112 cores and 6 TB of RAM in a single 2RU Dell server, Elasticsearch is an attractive application for virtualization or containerization.  Per guidance from Elastic, if your typical warm tier node with 64 GB of RAM manages 10 TB of data, a cold tier node can handle about twice as much data, and a frozen tier node will jump up to approximately ten times as much data.  We would recommend sizing for one physical CPU to one virtual CPU (vCPU) for Elasticsearch Hot Tier along with the management and control plane resources.   While this is admittedly like the VMware guidance for some similar analytics platforms, these virtual machines tend to consume a significantly smaller CPU footprint per data node.

 
 Figure 1: Logical Elastic Stack Architecture on HCI example

 

Conclusion

Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics.  Dell Technologies’ customers can take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots, and now with the introduction of frozen data for S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS.  Snapshots are the only reliable and supported method to back up an Elasticsearch cluster, and with the introduction of the cold and frozen tier, Elasticsearch snapshots become a critical component of Elasticsearch ILM.   ILM with frozen data greatly reduces storage costs for large volumes of data, and Dell Technologies provides a portfolio capable of addressing the entire Elastic data lifecycle and compute requirements with multiple deployment options.

About the Authors

Keith Quebodeaux, Greg Galvan, Steve Meilinger, and Mark Thomas are Systems Engineers and Sales Specialists with Dell Technologies Data Centric Workloads and Solutions (DCWS), working with customers and prospective customers on their data analytics, artificial intelligence, and machine learning initiatives.

Reference Links:

Read Full Blog
  • data analytics
  • big data
  • PowerEdge
  • GPU

Kinetica Can Give You Accelerated Analytics

Mike King Mike King

Wed, 02 Jun 2021 22:01:28 -0000

|

Read Time: 0 minutes

Accelerate Those Analytics With a GPGPU Database

First you might ask what a GPU database actually is.  In a nutshell it's typically a relational database that can offload certain operations to a GPU so that queries run faster.  There are three players in the space including Kinetica, Sqream and OmniSci.  By all measures Kinetica is the leader which is one of the key reasons we've chosen to partner with them through our OEM channel.  

The first thing one might ask is what kinds of things can a GPGPU Database do for me.  Some ideas for your consideration might be:

  1. Legacy RDBMS workloads from Oracle, DB2, Teradata, Sybase or SQL Server in an accelerated fashion with lower latency, better performance and greater throughput.
  2. Conduct location analytics on networks or geolocation data.
  3. Fraud detection

One of the coolest things I've found to date with Kinetica is that it only runs queries on the GPU where it can be accelerated.  Essentially joins, computations and math operations.  Queries involving a string search would be run on the CPUs.  In this matter collectively the entire workload can be accelerated.

These databases run on servers with direct attach storage capable of running NVidia GPUs.  In the Dell 14G product family the most common servers are R740, R740XD and R940XA servers.  For 15G the most appealing are R750, R750XA and XE8545 servers.  Other models are certainly possible but less common.  For purposes of this article we will focus on the R750XA.  This brand new server is based on Ice Lake processors and sports two sockets with up to 40 cores per socket for a maximal possible number of cores per server of 80.  A pair of top end A100 GPUs can configured with an NVLink bridge to enable interlinks of 600GB/s.  Systems can be configured with up to 6TB of memory including the latest 200 series optane modules.  Local storage is most common and this server can house up to eight 2.5" drives which can be either NVMe or SSD.  I know you're thinking what if my database can't fit on a single server.  Luckily the answer is simply to use more servers.  Kinetica can shard the db across n nodes.

If you want to learn more about Kinetica on Dell PE servers drop me a line at Mike.King2@Dell.com


Read Full Blog
  • data analytics
  • big data
  • PowerEdge
  • containers
  • cloud-native applications
  • data platform

Big Data as-a-Service(BDaaS) Use Cases on Robin Systems

Mike King Mike King

Wed, 24 Apr 2024 15:27:10 -0000

|

Read Time: 0 minutes

Do you have a Big Data mess?   Do you have separate infrastructure for the likes of NoSQL databases like Cassandra, MongoDB, Neo4j & Riak?  I’ll bet that kafka, spark and elastic search are on separate gear too.  Let’s throw in PostgreSQL, MariaDB, MySQL, Greenplum and another db or two.  We don’t want to forget machine learning with sckit-learn and DASK nor deep learning with Tensorflow and Pytorch.

What if I told you you could run all of them including test/dev, qa, prod w/ perhaps multiple instances and different versions all on the same multi-tenant, containerized platform?

Enter Robin Systems and their cloud native platform.  Some of the features I find useful include:

  • Similar to BlueData (HPE) but way better
  • Multi-tenant
  • Low cost
  • Easy to manage
  • Containerized via Kubernetes
  • Compact and dense
  • Disaggregated compute and storage or hybrid
  • One platform and set of BOMs for all tenants, multi-tenant
  • Can also do Oracle, Hadoop, elastic and more
  • Can be delivered direct or via partner
  • Infrastructure flexibility (compute-only, storage only, and/or hybrid nodes)
  • Infrastructure + application / service / storage level monitoring and visibility via integrated ELK/Grafana/Prometheus (out of the box templates and customizable)
  • QoS at the CPU, memory, disk, and network level + storage IOPs guarantees
  • App-store enables deployment of new app instances (or entire app pipelines) in minutes
  • Support for multiple run-time engines (LXC, Docker, kVM)
  • Templates to customize with deep workload knowledge
  • Application / storage / service thin cloning
  • Native, application-aware backups and snapshots
  • Scale up / scale down application / storage / service
  • Can use optional VMs
  • SAN storage via CSI is possible

As for the use cases some ideas

  • Just Oracle dense.  500 dbs on 18 servers.  SAN for storage.  RAC or not
  • MariaDB + Cassandra + MongoDB
  • Just Hadoop…all containerized, multiple clusters incl test/prod/qa
  • Hadoop + oracle
  • Kafka, Hadoop, elastic, Cassandra, Oracle
  • ML data pipelines
  • DL such as TF w/ GPUs
  • Spark
  • Any NoSQL database
  • RDBMSs such as MySQL, MariaDB, PostgreSQL, Greenplum, Oracle, etc..
  • Streaming analytics as with kafka or flink

Contact info for Mike King, Advisory System Engineer for DA / AI / Big Data, Dell Technologies | NA Data Center Workload Solutions

Links


Read Full Blog
  • big data
  • PowerEdge
  • NVMe
  • AMD

TPCx-Big Bench Rocks on Dell EMC PowerEdge R7515 with AMD Milan Processors

Nicholas Wakou Waseem Raja Mohan Rokkam Nicholas Wakou Waseem Raja Mohan Rokkam

Thu, 06 May 2021 11:05:30 -0000

|

Read Time: 0 minutes

Introduction

As part of the Dell Technologies AMD Milan launch in March 2021, Dell Technologies published eight results with the Transaction Processing Performance Council (TPC) (www.tpc.org). Six of those results are categorized as big data; five as TPCx-HS, and one as TPCx-Big Bench. Milan is a code name for AMD EPYC third-generation processors which represent a broad line-up of processors for cloud, enterprise, and high-performance computing workloads. This blog is part of a series that present these results. It also dives into why they should matter to Big Data enthusiasts and professionals in Engineering, Marketing, and Sales.

System under test (SUT)

 

Figure 1: SUT for TPCx-BB result

 

The SUT for the TPCx-BB result used 11 x Dell EMC PowerEdge R7515 servers: one NameNode server and 10 DataNode servers as shown in the Figure 1 above.

Hardware configuration

Data Size

SF 3000

Number of nodes

11

Processor

1 x AMD EPYC 7763 64-core, 2.45 GHz, 256 MB-L3

Memory

512 GB (8 x 64 GB RDIM 3200 MT/s Dual Rank)

Network (cluster connectivity)

1x Broadcom Dual Port 25 GbE NIC Mezzanine

Network (remote connectivity)

1 x Broadcom Gigabit Ethernet BCM5720 NIC

Software stack

Operating system

SUSE Linux Enterprise Server 12 SP5

Hadoop

Cloudera Private Cloud Base 7.1.4

Query execution engine

Hive on Tez

Java

Open JDK 64-Bit Server build 1.8.0_232-cloudera

Table 1: SUT configuration

TPCx-BB Overview

TPCx-Big Bench (TPCx-BB) is an application benchmark for Big Data Analytic Systems (BDAS). Three cornerstone aspects characterize Big Data systems: volume, velocity, and variety.

Volume refers to the size of the Big Bench dataset that is based on a single scale factor and is predictable and deterministic. Scale Factors are used to scale data from 1 TB to up to Petabytes of data. Velocity refers to the ability of the Big Data system to stay current through periodic refreshes, commonly known as Extraction, Transformation, and Load (ETL). Variety refers to the ability to deal with differently organized data, from unstructured to semi-structured and structured data.

TPCx-BB features 30 complex queries; query 1 to query 30. These queries are real-world and are designed along one business dimension and three technical dimensions that cover different business cases and technical perspectives: Data Source, Processing type, and Analytic technique.

Based on the McKinsey report (McKinsey Report on Big Data; Big Data: The next frontier for innovation, competition, and productivity; https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_full_report.pdf  ) on big data, 10 (query 1 – query 10) queries were identified that fall into five main categories of a retail business: Marketing, Merchandising, Operations, Supply Chain and New business models (price comparisons).

Data source dimension measures the type of input data the query is targeting. There are three types of input data in Big Bench: structured, semi-structured and unstructured. For example, Query 1 uses semi-structured web click streams as data source, while Query 3 performs sentiment words extraction on unstructured product reviews data.

Processing type dimension measures the type of processing appropriate for the query. This dimension covers the two common paradigms of declarative and procedural languages. In other words, some of the queries can be answered by declarative languages; others by procedural languages; and others by a mix of both.

Analytic technique dimension measures different techniques for answering business analytics questions. In general, three major categories of analytic techniques were identified: statistical analysis, data mining, and simple reporting.

The TPC requires that Express benchmarks like TPCx-BB must run the TPC-provided kit in order to publish a compliant TPC Express result. The latest TPCx-BB kit can be downloaded from TPC Documentation Webpage.

Scale Factor

TPCx-BB defines a set of discrete scaling points (scale factors) based on the approximate size of the raw data that the data generator produces, in GB. Each defined scale factor has an associated value for SF, a unit-less quantity, roughly equivalent to the number of GB of data present on the storage. Test sponsors may choose any scale factor from the defined series except SF1 which is used for Result validation only. No other scale factors may be used for a TPCx-BB Result.

1

   1000

3000

10000

30000

100000

300000

1000000

Table 2: Allowable scale factors

Dell Technologies published on SF 3000 which is equivalent to 2,794 GB of raw data.

Benchmark Phases

TPCx-BB defines three phases: Load test, Power test, and Throughput test. The three tests run sequentially and are not permitted to overlap.

During the load test, the test database that is used to run the three phases is built. The Power test determines the time the SUT can process all 30 queries which must run sequentially in ascending order. The Throughput test runs 30 queries using concurrent streams. Each stream runs all 30 queries in a specified placement order. The default number of streams is set to 2, but the number of concurrent streams is configurable with no maximum limit.

The results must be run as per TPCx-BB specification in order to pass an audit. Then the TPC publishes them. A compliant benchmark test consists of a validation test followed by two benchmark runs; Run 1 and Run 2. A benchmark run consists of three phases as stated above: Load test, Power test, and Throughput test. The validation test performs the three benchmark phases with Scale Factor 1 and validates the results against the reference result set in the kit. The validation test ensures that the engine that the test sponsor uses can match the reference result set generated.

 

Figure 2: Benchmark execution phases

What is measured?

The benchmark measures the time that it takes for all 30 queries to be performed. All TPC published results must disclose a Primary Metric that consists of three elements: Performance metric, Price/Performance metric, and an availability date. 

The performance metric is computed from metric components representing Load, Power, and Throughput tests as defined above. 

SF

Scale Factor

 

TLoad

Elapsed time of Load test

 

TLD

Load Factor

TLD=0.1 * TLoad

Q(i)

Elapsed time in seconds

of Query i

 

M

Number of Queries

 

TPT

Geometric Mean of the elapsed time of each of the 30 Queries measured in the power run

 

 

TTput

Elapsed time of all streams in the Throughput test

 

 TTT

Throughput test metric

n

Number of streams in the Throughput Test

 

 

Performance Metric

 

Table 3: Computation of the performance metric

The price/performance metric is computed as below:

 

The system availability date is the date all components of the SUT are generally available to customers. Any reference to a TPC result must disclose all the 3 elements of the Primary Metric. 

TPCx-BB consists of application level workloads that essentially measure the efficiency of the underlying infrastructure. A good result will depend on a well optimized and tuned infrastructure, from the BIOS/OS settings through the Hadoop framework, to the application level (Hive, Spark). For this result, Cloudera Private Cloud Base (CDP 7.1.4) configuration settings were optimized (Dell EMC PowerEdge 14G Performance Characterization for Data Analytics; https://infohub.delltechnologies.com/section-assets/h17247-poweredge-14g-performance-characterization-for-data-analytics-technical-white-paper) based on the resources (CPU cores, Memory and Storage ) available to the cluster. Hive (SQL Engine) settings for each query were also tweaked for improved query performance. Additionally, Spark Submit operator settings were adjusted for better performance of the five Machine Learning queries (q5, q20, q25,q26 and q28). 

Results

The artifacts for this result can be found on this TPC Results Page. Table 4 below summarizes two results of the TPCx-BB SF3000 performance table: the most recent result based on Dell EMC PowerEdge 7515 and the previous best result (PrevBest) from a competing company. The PrevBest result was submitted four years ago and is now historical.

A historical result is one that:

  • Is still in an accepted state
  • Has been posted:
    1. At least 185 days past the submission date
    2. At least 60 days past the availability date.

On the TPCx-BB Performance results page, historical results are not displayed by default unless the Include Historical Results option is checked. This result (PrevBest) was the previous best result. It is added to show how far technology has moved and improved Big Data performance in the last four years.

 

Run 1

Run 2

PrevBest (PrevBest is a historical result.)

Load test (s)

473.10

472.69

1,302.23

Power test (s)

5,612.53

5,630.56

16,294.45

Throughput test (s)

24,907.11

24,792.50

52,751.28

Overall run time

30,992.74 (8.61 hrs.)

30,895.76

70,347.97 (19.54 hrs.) 

Performance (BBQpm)

1,544.13

1,547.29

611.31

Price/Performance ($/BBQpm)

487.85

 

646.31

Availability date

03-15-2021

 

12-29-2016

Number of nodes

11

 

16

Total rack units (TRU)

23

 

34

Processor/Cores/Threads (P/C/T)

11/704/1,408

 

32/512/1,024

Storage

54,240GB/30xNVMe + 24 SSD

 

220,480GB/ 254 HDD

Table 4: TPCx-BB SF3000 performance results on March 15, 2021

Figure 2 above shows the benchmark phases. Run 1 and Run 2 are run sequentially on the same SUT. The run with the smallest BBQpm score, in this case Run1, is designated as the Performance run. The run with the lowest score (Run 2) is the Repeatability run.

Figure 3 below is a chart showing that the Dell EMC PowerEdge 7515 result scored 2.53x better performance than the previous best result. PrevBest* is a historical result.

Fig 3: TPCx-BB SF 3000 performance result on March 15, 2021

Figure 4 is a chart that shows that price/performance improved by 24.52%.

Fig 4: TPCx-BB SF 3000 price/performance result on March 15, 2021

Key Takeaways

  1. Dell Technologies renews its interest in the smaller SF TPCx-BB space after over four years.
  2. More processing power is contained in smaller packages. The Dell EMC PowerEdge R7515 result used more processing cores in a smaller footprint (TRU) compared to the previous best result. This tendency is consistent with current industry trends.
  3. It pays to invest in faster storage. The Dell EMC PowerEdge R7515 result used about 54 TB of NVMe + SSD storage compared to about 220 TB of HDD storage. The faster storage better matched the faster processors enabling more efficient processing. That power contributed to the SUT posting 2.53x better performance and 24.52% better price/performance than the previous best TPCx-BB SF 3000 result. The SUT used in this result took advantage of the lower prices for NVMe/SSD storage devices.
  4. Real-world queries should run over 2x faster on the PowerEdge R7515, based on previous results. Table 4 shows that the Dell EMC PowerEdge R7515 result was performed in 8.61 hours compared to 19.54 hours used by the previous best result.

Conclusion

This result demonstrates that Dell Technologies has a renewed interest in smaller SFs after several years. The previous most recent results from Dell EMC have been on the bigger SF 10000.

The TPCx-BB SF 3000 result showed that Dell EMC PowerEdge R7515 hardware platforms with AMD Milan processors pack blazing performance in smaller (than predecessors) rack units. These servers enable smaller data center footprints without sacrificing price or performance. This advantage coincides with observations from the other results which are part of this blog series.

Dell Technologies uses these results to provide verifiable performance data about its products and solutions to its customers. For that reason, Dell Technologies has been an active member of the TPCx-Big Bench Technical Committee. Dell Technologies continues to collaborate with other stakeholders within the industry to maintain the TPCx-BB specification. 


Read Full Blog
  • big data
  • PowerEdge
  • NVMe
  • AMD

TPCx-HS Results on Dell EMC Hardware with AMD Milan Processors

Nicholas Wakou Nicholas Wakou

Mon, 19 Apr 2021 17:56:26 -0000

|

Read Time: 0 minutes

Introduction

As part of the Dell Technologies AMD Milan launch in March 2021, Dell Technologies published eight results with the Transaction Processing Performance Council (TPC) (www.tpc.org). Six of those results are categorized as Big Data; five as TPCx-HS and one as TPCx-Big Bench. Milan is a code name for AMD EPYC third-generation processors which represent a broad line-up for cloud, enterprise, and high-performance computing workloads. TPC Big Data results are categorized by data sizes that are known as Scale Factor (SF). The results showcase Big Data performance at the low-end SF 1 TB, while at the high end TPCx-HS crosses the “size frontier” with a 100 TB result.

Big Data technologies like Hadoop and Spark have become an important part of the enterprise IT ecosystem. The TPC Express Benchmark™ HS (TPCx-HS) was developed to provide an objective measure of hardware, operating system, and commercial Apache Hadoop File System API compatible software distributions. Also, it provides the industry with verifiable performance, price-performance, and availability metrics. The benchmark models a continuous system availability of 24 hours a day, 7 days a week.

This blog dives deeply into the results, what they mean, and why they are important for Big Data enthusiasts and professionals in Engineering, Marketing, and Sales.     

 System under test (SUT)

Figure 1: SUT for the SF100 TB result

All the 5 TPCx-HS results used 1 x PowerEdge 7515 for the Master node and either 9 or 16 PowerEdge R6515 nodes for the Worker nodes. The only major hardware configuration difference apart from the number of Worker nodes was the storage size. The SF 100 TB result was run on a SUT with each Worker node having 8 x 3.2 TB NVMe. All other results used 5 x 3.2 TB NVMe.  

Hardware Configuration

Data Size

SF 100 TB 

SF 3 TB

SF 1 TB

# nodes

17

17

17

10

10

Processor/Cores/Threads (P/C/T)

17/544/1,088

17/544/1,088

17/544/1,088

10/320/640

10/320/640

Framework

MapReduce

Spark

Processor

1 x AMD EPYC 75F3 32-core, 2.95 GHz, 256 MB-L3

Memory

512GB (8x64GB RDIM 3200 MT/s Dual Rank)

Network (Cluster Connectivity)

1x Mellanox Dual Port ConnectX-5 100GbE QSFP28 NIC

Network (Remote Connectivity)

1 x Broadcom Gigabit Ethernet BCM5720 NIC

Data Storage (Number of 3.2 TB NVMe) 

8

5

Software Stack

Operating System

SUSE Linux Enterprise Server 12 SP5

Hadoop

Cloudera Private Cloud Base 7.1.4

Java

Open JDK 64-Bit Server build 1.8.0_232-cloudera

Table 1: SUT configuration

Benchmark workload

TPCx-HS was the first Big Data industry-standard benchmark that was designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload that is defined in TeraSuite (TeraGen, TeraSort, TeraValidate) with formal rules for implementation, execution, metric, result verification, publication, and pricing. It can be used to assess a broad range of Big Data Hadoop system topologies, implementation methodologies and systems in a technically rigorous, directly comparable, and vendor-neutral manner. The current TPCx-HS specification can be found on the TPC Documentation Webpage.

The TPC requires that Express benchmarks like TPCx-HS must run the TPC-provided kit in order to publish a compliant TPC Express result. The latest TPCx-HS kit can be downloaded from www.tpc.org/tpcx-hs. The benchmark workload consists of the following modules:

  • HSGen generates data at a particular Scale Factor. It is based on TeraGen.
  • HSDataCheck checks the validity of the dataset and replication.
  • HSSort sorts and orders the data. It is based on TeraSort.
  • HSValidate validates the sorted output. It is based on TeraValidate.

The benchmark test is performed in five phases that are run from a TPCx-HS-master script. The phases must run sequentially without any overlaps.

Figure 2: TPCx-HS execution phases

The benchmark test consists of two runs, Run 1 and Run 2, which must follow the run phases shown above. Except for file system cleanup, no activities are allowed between Run 1 and Run 2. The total elapsed runtime T, in seconds is used for the TPCx-HS Performance Metric calculation. The performance run is defined as the run (Run 1 or Run 2) with the lower TPCx-HS Performance Metric. The repeatability run is defined as the run (Run 1 or Run 2) with the higher TPCx-HS Performance Metric. The reported Performance Metric is the TPCx-HS Performance Metric for the performance run.

Scale factors

Scale Factor (SF) is the dataset size in relation to the minimum required size of a test dataset. For TPCx-HS, the test dataset size must be selected from a set of fixed SFs:

   1 TB

3 TB

10 TB

30 TB

100 TB

300 TB

1000 TB

3000 TB

10,000 TB

The SF 100 TB result was the first TPCx-HS result to be published at that SF, which was a major milestone for the Industry.

What is measured?

All TPC published results disclose a Primary Metric that consists of a Performance metric, Price/Performance metric, and an availability date. For TPCx-HS:

  1. Performance Metric (HSph@SF) reflects the throughput of a run (Run 1 or Run 2) at Scale Factor SF. This metric is the elapsed time T for a performance run to perform all the five phases shown in Figure 2.
  2. Price/Performance Metric ($/HSph@SF) indicates the Total Cost of Ownership P needed to own and sustain the SUT that scored the Performance Metric.
  3. System availability date is the day all components used in the Performance test will be available to customers as defined in the TPC Pricing specification.

What the metric numbers mean  

Generally, the faster the performance run is completed, the higher the performance score. The score is obtained by normalizing the run times as per the formulas shown above. For Price/Performance, the lower the metric score the better. In this case, a higher Performance score achieved on a SUT with a lower Total Cost of Ownership P will show a better price/performance metric. 

Results

As part of the Dell Technologies AMD Milan launch, Dell Technologies published five TPCx-HS results on March 04, 2021. These results summarized in Table 2 below show several performance scenarios:

  • Data sizes that are scaled from SF 1 TB to SF 100 TB
  • Two different frameworks MapReduce and Spark
  • Two different cluster sizes 10 nodes and 17 nodes

The common denominator for these results is that all SUTs used the AMD Milan EPYC 75F3 processors.

Data Size

SF 100 TB 

SF 3 TB

SF 1 TB

Number of nodes

17

17

17

10

10

Framework

MapReduce

Spark

HSGen (s)

1,604.68

62.32

33.03

47.14

41.22

HSSort (s)

5,819.10

205.21

87.76

140.68

119.28

HSValidate (s)

791.54

36.25

22.09

26.01

15.99

Elapsed Run time (s)

8,225

313

146

218

181

Data Size

SF 100 TB 

SF 3 TB

SF 1 TB

Performance Metric (HSph)

43.76

34.52

24.69

16.52

19.92

Total Cost of Ownership (TCO)

$1,344,855

$1,229,447

$1,229,447

$728,080

$728,080

Price/Performance ($/HSph)

$30,733

$35,616

$49,795

$44,073

$36,550

Total Rack Units (TRU)

20U

20U

20U

13U

13U

Performance/TRU

2.19

1.73

1.23

1.27

1.53

Table 2: TPCx-HS results published on March 04, 2021

The table also shows that data sizes were scaled from 1 TB to 100 TB using SUTs occupying the same space as measured in Total Rack Units (TRU). Detailed results can be found at http://tpc.org/tpcx-hs/results/tpcxhs_perf_results5.asp?version=2.

Figure 3 below is a scatter chart that shows the relative performance, relative price/performance, and performance/TRU of all the TPCx-HS SF 1 TB results on March 04, 2021. These results are based on three AMD processor generations. The results in red markers are based on the first-generation AMD Naples processor. Orange markers show results that are based on the second-generation processor AMD Rome. Blue markers show results that are based on the most recent third-generation AMD Milan processors. Green markers show results that were from a competitor, CompA.

 Figure 3: TPCx-HS SF 1 TB results by processor

The relative performance and price/performance scores use the results of the AMD Naples-based SUT as a reference. All performance results (circle marker) above the dashed line performed better than the reference SUT and those results below performed worse. Conversely, price/performance results (diamond marker) above the dashed line scored worse than those results below the line.  

Figure 4 is a similar chart but shows results at SF 3 TB.

 Figure 4: TPCx-HS SF 3 TB results by processor

Figure 5 is a bar chart that shows relative performance using the results of the R6415-Naples-17node-MR SUT as the reference. The bars show results that are based on AMD processors: red for AMD Naples; orange for AMD Rome; and blue for AMD Milan. The green bars are for results from competitor CompA.

 

Figure 5: TPCx-HS SF 1 TB relative performance results

Figure 6 is a chart that shows performance per TRU and color-coded similarly to Figure 5.

  Figure 6: Performance/TRU

Key takeaways from the results

  1. AMD Milan-based SUTs give the best bang for the money based on TPCx-HS results. Figure 5 shows that they performed up to 2.72x better than SUTs based on earlier generation AMD processors and the competition. Figures 3 and 4 show that the price/performance is comparable to that of the reference SUT.  
  2. Data sizes can be scaled without fear of reduction in price/performance. Table 2 shows that price/performance based on Total Cost of Ownership improves remarkably as the data sizes are scaled from SF 1 TB to SF 100 TB.
  3. It is worth investing in NVMe-based storage. As shown in Table 2, all the results used NVMe-based storage. This configuration enabled them to use the 1U R6515 servers which occupied less rack space without a reduction in computing resources.
  4. AMD Milan-based SUTs enable reduced data center footprint. Table 2 and Figure 6 show that within the same space, the workload size can be increased by a factor of 100 without loss of efficiency.
  5. AMD Milan-based SUTs show improved performance efficiency at scale. At SF 100 TB, more data is being processed in a proportionately less time.

Conclusion

With the publication of TPCx-HS results based on AMD Milan processors, Dell Technologies has become the most dominant publisher of TPC Big Data benchmarks. The results show that Dell EMC hardware platforms with third-generation AMD EPYC processors can do more work efficiently in less space, and with more value for the dollar.  

Read Full Blog
  • data analytics
  • ECS

Delivering Innovation with Object-Based Analytics

John Shirley John Shirley

Thu, 25 Mar 2021 18:17:06 -0000

|

Read Time: 0 minutes

As analytic workloads continue to grow, more pressure is placed on data teams to build an effective data strategy.

One large financial customer of ours is challenging their data teams to help fight false-positive card declines, which according to Javelin Research costs issuers and merchants over $118B. However, ensuring fraud prevention while cutting down on the false positives can be a fine line to walk. The key to solving this problem with an analytics model is to start with the data.

At Dell Technologies, we have helped our customers work through these challenges for many years, from building fraud detection to enabling life-saving healthcare. We understand that getting the data strategy right can help teams build models to solve their real-world problems. Dell Technologies has engaged in joint engineering and validation efforts to bring our leading distributed object storage product Dell EMC ECS to integrate industry leaders in the Analytics space.

Today we are happy to announce a collaboration with unified analytics warehousing leader Vertica, which will allow our customers to deliver cloud innovation on-premises, greater operational flexibility and efficiency, and scale infrastructure resources independently. Working with Vertica will allow our joint analytics customers to deliver flexible and efficient architecture by separating computer and storage.

Collaborating on Object-Based Analytics with Vertica

Vertica is a unified analytics warehouse built to deliver blazing-fast query performance regardless of scale or concurrency requirements. It is a highly scalable analytical database that works well in many deployment situations – on-premises, on top of a Hadoop or S3 data lake, and in any public clouds. Vertica features powerful SQL-based analytics, time-series, geospatial, and in-database machine learning capabilities. Vertica removes typical barriers to analytics for some of the world’s most prominent data-centric organizations.

Vertica in Eon mode decouples compute from storage to give customers the benefits of cloud architecture for analytic workloads. Previously only available in the public clouds, Vertica now offers Eon Mode with superior on-premises object storage solutions.

“Our customers trust us to provide the greatest freedom in how they consume the highest performance analytics – flexibility for the broadest deployment options, whether it’s deploying Vertica on any major public cloud or on-premises with more leading object storage options,” said Colin Mahony, senior vice president and general manager of Vertica.

Help Data Teams Solve Real World Problem

Data Teams can begin taking advantage of our joint collaboration today. Today’s announcement allows customers to:

  • Vertica in EON Mode for Dell EMC ECS delivers cloud innovation on-premises.
  • Separate compute and storage with ECS + Vertica EON Mode for delivering operations flexibility and efficiency.
  • Scale infrastructure resources independently. Storage can grow without adding expensive compute, and compute can be scaled up or down with variable or intermittent workloads.

Vertica in Eon Mode for Dell EMC ECS gives companies a consistent platform for analytics across all of their environments, whether their data resides in the cloud or on-premises, or in a hybrid architecture. Check this white paper to learn about the technologies and environment used to confirm compatibility between Vertica in Eon Mode and Dell EMC ECS platform.

 

Read Full Blog
  • AI
  • data analytics

AI-based Edge Analytics in the Service Provider Space

Khayam Anjam Raja Neogi Khayam Anjam Raja Neogi

Fri, 15 Jan 2021 11:48:37 -0000

|

Read Time: 0 minutes

Introduction 

     Advances in Service Provider performance management lags behind growth in digital transformation. Consider, for example, Dynamic Spectrum Sharing (DSS) in 5G networks – operators need to rapidly map small-cell flows to available frequency bands, in the presence of constraints like differing radio technologies and interference. Another example is the need to detect and/or predict infrastructure failures from KPIs, Traces, Profiles and Knowledge-bases, to trigger a fix before an issue manifests itself. Yet another example is energy optimization in data-centers, where servers are powered off to save energy, and workloads are moved around in the cluster, without affecting end to-end service. It is clear that in all of these scenarios, and in numerous other use-cases affecting industries such as factory automation, automotive, IIoT, smart cities, energy, healthcare, entertainment, and surveillance, AI on Big Data needs to replace legacy IT processes and tasks to trigger timely changes in the network substrate. The following figure illustrates how Big Data from the substrate can be consumed by fast-responding, interconnected AI models to act on service degradations. The traditional approach of DevOps reacting to irregularities visualized through Network Operations Center (NOC) terminals does not scale. Gartner and IDC both predict that by 2024 more than 60 percent of Mobile Network Operators (MNO) will adopt AI-based analytics in their IT operations. 

 

Figure 1. Decision and Controls with Models

 

Architecture

   Data streams originating in the substrate, and gathered in the collection gateway, may be compressed. There may be gaps in data collection that need interpolation. Not all data types collected will have an equal impact on decision-making, which means that feature-filtering is important for decision-making. These issues justify the need for multi-stage pre-processing. Similarly, rapid decision-making can be achieved through multi-stage interconnected models using deep-learning technology. Instead of having one huge and complex model, experts agree that simpler interconnected models lead to more reusable design.  The following figure illustrates the decision-making process.  It shows a sample interconnected model graph that detects anomalies, identifies root-causes, and decides on a control sequence to recommend remediation measures.

 

 

Figure 2. Runtime acceleration key for real-time loops

 

   Deep-learning is a good tool for inductive reasoning, but deductive reasoning is also important for decision-making (for example, to limit cascading errors) and this requires one or more postprocessing stages. Collectively, these arguments point to a need for auto-pipelining through Function-as-a Service (FaaS) for WebScale automation in the cloud-native space. Add to this the need for streaming, visualization, and time-series databases for selective data-processing in the stages, and what we end up with is a Machine Learning Operating System (ML-OS) that provides these services. An ML-OS, such as Nuclio, automatically maps pipelined functions (for example, python definitions) to cloud-native frameworks, utilizing specified configurations, as well as supporting open-source tools for visualization, streaming, in-memory time-series databases, and GPU-based model acceleration. Applications developed on the ML-OS ingest data and output control sequences for continuous optimization in decision-making. These real-time decision-making loops collectively enable WebScale Network Automation, Slice Management, RAN operations, Traffic Optimization, QoE Optimization, and Security.  The following figure illustrates the AIOPs platform.

 

 

Figure 3. AIOPs Platform

 

Deployment

    In this section we show our prototype deployment using Generation (substrate) and Analytics infrastructure, as shown in the following figure. Generation includes workload that is placed in Red Hat OpenStack (R16) VMs, where synthetically-generated tomography images are compressively sensed in one VM and then reconstructed in another VM. System performance metrics from this workload environment are exported to the Service Telemetry Framework (STF) Gateway placed in RedHat OpenShift (v4.3) containers, which gather data for streaming to the Analytics cluster placed in VMware (v6.7) VMs. The Analytics cluster includes Iguazio ML-OS with native GPU acceleration, and Anodot models for correlation and anomaly detection. 

 

 

Figure 4. Workload layout in virtual infrastructure

 

    The OpenStack cluster has 3 physical control nodes (R640) and 2 physical compute nodes (R740). Vm-1 generates random tomography images, which are compressed and dispatched to Vm-2 for Reconstruction using L1 Lasso Regression. OpenShift (OCP) is deployed on a pool of VMware virtual hosts (v6.7) with vSAN (see Reference Design) on 3 physical nodes (R740xd). OCP deployment spans 3 control and 5 compute virtual hosts. There is a separate administration virtual host (CSAH) for infrastructure services (DHCP, DNS, HAPROXY, TFTP, PXE, and CLI) on the OCP platform. vSphere CSI drivers are enabled on the OCP platform so that persistent volume requirements for OCP pods are satisfied by vSAN storage. RH STF deployed on OpenShift facilitates the automated collection of measurements from a workload environment over RabbitMQ message bus. STF stores metrics in the local Prometheus database and can forward to data sinks like Nuclio or Isilon (remote storage). Nuclio ML-OS is installed as 3 data VMs and 3 application VMs using data, client, and management networks. Anodot models in the application VMs process metrics from the OpenStack environment to detect anomalies and correlate them, as shown in the following figure.

 

 

Figure 5. Sleeve tightening of metrics in Anodot

 

Building Blocks

   The Python snippet and image snapshot shown below capture a workload running in the OpenStack space. Self-timed logic (not shown here) in Vm-1 is used to randomly generate CPU -utilization surges in compression by resizing imaging resolution. The Anodot dashboard shows the resulting surge in CPU-utilization in Vm-2 during reconstruction, hinting at a root cause issue. Similar behavior can be seen in network utilization, which the Anodot dashboard shows by aligning the anomalies to indicate correlation. 

 

 

 

 

 

 

 

 

Figure 6. Anomaly detection in Anodot

 

Summary 

     The analytics solution proposed here uses open interfaces to aggregate data from all segments of the network, such as RAN, Packet Core, IMS, Messaging, Transport, Platform, and Devices. This provides the ability to correlate metrics and events across all nodes to create an end-to-end view of the network, the flow or a slice. AI turns this end-to-end insight into tangible inferences that drive autonomics in the substrate through control sequences.


Read Full Blog
  • data analytics
  • big data

Real-Time Streaming Solutions Beyond Data Ingestion

Amir Bahmanyari Amir Bahmanyari

Thu, 17 Dec 2020 03:05:00 -0000

|

Read Time: 0 minutes

So, it has been all about data—data at rest, data in-flight, IoT data, and so forth. Let’s touch base on the traditional data processing approaches and look at their synergy with modern database technologies. Users’ model-based inquiries manifest to a data entity that is created upon initiation of the request payloads. Traditional database and business applications have been the lone actors that collaborated to provide implementations of such data models. They interact in processing of the users’ inquiries and persisting the results in static data stores for further updates. The business continuity is measured by a degree of such activities among business applications consuming data from these shared data stores. Of course, with a lower degree of such activities, there exists a high potential for the business to be at idle states of operations caused by waiting for more data acquisitions.

The above paradigm is inherently set to potentially miss a great opportunity to maintain a higher degree of business continuity. To fill these gaps, a shift in the static data store paradigm is necessary. The new massive ingested data processing requirements mandate the implementation of processing models that continuously generate insight from any “data in-flight,” mostly in real time. To overcome storage access performance bottlenecks, persisting the interim computed results in a permanent data store is expected to be kept at a minimal level. 

This blog addresses these modern data processing models from a real-time streaming ingestion and processing perspective. In addition, it discusses Dell Technologies’ offerings of such models in detail.

Customers have an option of building their own solutions based on the open source projects for adopting real-time streaming analytics technologies. The mix and match of such components to implement real-time data ingestion and processing infrastructures is cumbersome. It requires a variety of costly skills to stabilize such infrastructures in production environments. Dell Technologies offers validated reference architectures to meet target KPIs on storage and compute capacities to simplify these implementations. The following sections provide high-level information about real-time data streaming and popular platforms to implement these solutions. This blog focuses particularly on two Ready Architecture solutions from Dell—Streaming Data Platform (formerly known as Nautilus) and a Real-Time Streaming reference architecture based on Confluent’s Kafka ingestion platform—and provides a comparative analysis of the platforms.

Real-time data streaming

The topic of real-time data streaming goes far beyond ingesting data in real time. Many publications clearly describe the compelling objectives behind a system that ingests millions of data events in real time. An article from Jay Kreps, one of the co-creators of open source Apache Kafka, provides a comprehensive and in-depth overview of ingesting real-time streaming data. This blog focuses on both ingestion and the processing side of the real-time streaming analytics platforms.

Real-time streaming analytics platforms

A comprehensive end-to-end big data analytics platform demands must-have features that:

  • Simplify the data ingestion layer 
  • Integrate seamlessly with other components in the big data ecosystem
  • Provide programming model APIs for developing insight-analytics applications 
  • Provide plug-and-play hooks to expose the processed data to visualization and business intelligence layers

Over the past many years, demand for real-time ingestion features have created motivations for implementing several streaming analytics engines, each with a unique targeted architecture. Streaming analytics engines provide capabilities ranging from micro-batching the streamed data during processing to a near-real-time performance to a true-real-time processing behavior. The ingested datatype may range from a byte-stream event to a complex event format. Examples of such data size ingestion engines are Dell Technologies supported Pravega and open source Apache 2.0 Kafka that can be seamlessly integrated with open source big data analytics engines such as Samza, Spark, Flink, and Storm, to name a few. Proprietary implementations of similar technologies are offered by a variety of vendors. A short list of these products includes Striim, WSO2 Complex Event processor, IBM Streams, SAP Event Stream Processor, and TIBCO Event Processing

Real-time streaming analytics solutions: A Dell Technologies strategy

Dell Technologies offer customers two solutions to implement their real-time streaming infrastructure. One solution is built on Apache Kafka as the ingestion layer and Kafka Stream Processing as the default streaming data processing engine. The second solution is built on open source Pravega as the ingestion layer and Flink as the default real-time streaming data processing engine. But how are these solutions being used in response to customers’ requirements? Let’s review possible integration patterns where Dell Technologies real-time streaming offerings facilitate data ingestion and partial preprocessing layers for implementing these patterns.

Real-time streaming and big data processing patterns

Customers implement real-time streaming in different ways to meet their specific requirements. This implies that there may exist many ways of integrating a real-time streaming solution, with the remaining components in the customer’s infrastructure ecosystem. Figure 1 depicts a minimal big data integration pattern that customers may implement by mixing and matching a variety of existing streaming, storage, compute, and business analytics technologies.  

                   

Figure 1: A modern big data integration pattern for processing real-time ingested data

There are several options to implement the Stream Processors layer, including the following two offerings from Dell Technologies.

Dell EMC–Confluent Ready Architecture for Real-Time Data Streaming

The core component of this solution is Apache Kafka, which also delivers Kafka Stream Processing in the same package. Confluent provides and supports the Apache Kafka distribution along with Confluent Enterprise-Ready Platform with advanced capabilities to improve Kafka. Additional community and commercial platform features enable:

  • Accelerated application development and connectivity 
  • Event transformations through stream processing 
  • Simplified enterprise operations at scale and adherence to stringent architectural requirements

Dell Technologies provides infrastructure for implementing stream processing deployment architectures using one of two Kafka distributions from Confluent—Standard Cluster Architecture or Large Cluster Architecture. Both cluster architectures may be implemented as either the streaming branch of a Lambda Architecture or as the single process flow engine in a Kappa Architecture. For a description of the difference between the two architectures, see this blog. For more details about the product, see Dell Real-Time Big Data Streaming Ready Architecture documentation. 

  • Standard Cluster Architecture: This architecture consists of two Dell EMC PowerEdge R640 servers to provide resources for Confluent’s Control Center, three R640 servers to host Kafka Brokers, and two R640 servers to provide compute and storage resources for Confluent’s higher-level KSQL APIs leveraging the Apache Kafka Stream Processing engine. The Kafka Broker nodes also host the Kafka Zookeeper and the Kafka Rebalancer applications. Figure 2 depicts the Standard Cluster Architecture.

      Figure 2: Standard Dell Real-Time Streaming Big Data Cluster Architecture

  • Large Cluster Architecture: This architecture consists of two PowerEdge R640 servers to provide resources for Confluent’s Control Center, a configurable number of R640 servers for scalability to host Kafka Brokers, and a configurable number of R640 servers to provide compute and storage resources for Confluent’s KSQL APIs to the implementation of the Apache Kafka Stream Processing engine. The Kafka Broker nodes also host the Kafka Zookeeper and the Kafka Rebalancer applications. Figure 3 depicts the Standard Cluster Architecture.

Figure 3: Large Scalable Dell Real-Time Streaming Big Data Cluster Architecture

Dell EMC Streaming Data Platform (SDP)

SDP is an elastically scalable platform for ingesting, storing, and analyzing continuously streaming data in real time. The platform can concurrently process both real-time and collected historical data in the same application. The core components of SDP are open source Pravega for ingestion, Long Term Storage, Apache Flink for compute, open source Kubernetes, and a Dell Technologies proprietary software known as Management Platform. Figure 4 shows the SDP architecture and its software stack components.

Figure 4: Streaming Data Platform Architecture Overview

  • Open source Pravega provides the ingestion and storage artifacts by implementing streams built from heterogeneous datatypes and storing them as appended “segments.” The classes of Unstructured, Structured, and Semi-Structured data may range from a small number of bytes emitted by IoT devices, to clickstreams generated from the users while they surf websites, to business applications’ intermediate transaction results, to virtually any size complex events. Briefly, SDP offers two options for Pravega’s persistent Long Term Storage: Dell EMC Isilon and Dell EMC ECS S3. These storage options are mutually exclusive—that is, both cannot be used in the same SDP instance. Currently, upgrading from one to another is yet to be supported. For details on Pravega and its role in providing storage for SDP streams using Isilon or ECS S3, refer to this Pravega webinar
  • Apache Flink is SDP’s default event processing engine. It consumes ingested streamed data from Pravega’s storage layer and processes it in an instance of a previously implemented data pipeline application. The pipeline application invokes Flink DataStream APIs and processes continuous unbounded streams of data in real time. Alternatives to Flink analytics engines, such as Apache Spark, are also available. To unify multiple analytics engines’ APIs and to prevent writing multiple versions of the same data pipeline application, an attempt is underway to add Apache Beam APIs to SDP to allow the implementation of one Flink data pipeline application that can run on multiple underlying engines on demand. 

Comparative analysis: Dell EMC real-time streaming solutions

Both Dell EMC real-time streaming solutions address the same problem and ultimately provide the same solution for it. However, in addition to using different technology implementations, each tends to be a better fit for certain streaming workloads. The best starting point for selecting one over the other is with an understanding of the exactions of the target use case and workload. 

In most situations, users know what they want for a real-time ingestion solution—typically  an open-source solution that is popular in the industry. Kafka is demanded by customers in most of these situations. Additional characteristics, such as the mechanisms for receiving and storing events and for processing, are secondary. Most of our customer conversations are about a reliable ingestion layer that can guarantee delivery of the customer’s business events to the consuming applications. Further detailed expectations are focused on no loss of events, simple yet long-term storage capacity, and, in most cases, a well-defined process integration method for implementing their initial preprocessing tasks such as filtering, cleansing, and any transformation-like Extract Transform Load (ETL). The purpose of preprocessing is to offload nonbusiness-logic-related work from the target analytics engine—i.e., Spark, Flink, Kafka Stream Processing—resulting in better overall end-to-end real-time performance.

Kafka and Pravega in a nutshell

Kafka is essentially a messaging vehicle to decouple the sender of the event from the application that processes it for gaining business insight. By default, Kafka uses the local disk for temporarily persisting the incoming data. However, the longer-term storage for the ingested data is implemented in what’s known as Kafka Broker Servers. When an event is received, it is broadcast to the interested applications known as subscribers. An application may subscribe to more than one event-type-group, also known as a topic. By default, Kafka stores and replicates events of a topic in partitions configured in Kafka Brokers. The replicas of an event may be distributed among several Brokers to prevent data loss and guarantee recovery in case of a failover. A Broker cluster may be constructed and configured on several Dell EMC PowerEdge R640 servers. To avoid Brokers’ storage and compute capacity limitations, the Brokers’ cluster may be extended through the addition of more Brokers to the cluster topology. This is a horizontally scalable characteristic of Kafka architecture. By design, the de facto analytics engine provided in an open source Kafka stack is known as Kafka Stream Processing. It is customary to use Kafka Stream Processing as a preprocessing engine and then route the results as real-time streaming artifacts to an actual business logic implementing analytics engine such as Flink or Spark Streaming. Confluent wraps the Kafka Stream Processing implementation in an abstract process layer known as KSQL APIs. It makes it extremely simple to run SQL like statements to process events in the core Kafka Stream Processing engine instead of complex third-generation languages such as Java or C++, or scripting languages such as Python.

Unlike Kafka’s messaging protocol and events persisting partitions, Pravega implements a storage protocol and starts to temporarily persist events as appended streams. As time goes by, and the events age, they become long-term data entities. Therefore, unlike Kafka, the Pravega architecture does not require separate long-term storage. Eventually, the historical data is available in the same storage. Pravega, in Dell’s current SDP architecture, routes previously appended streams to Flink, which provides a data pipeline to implement the actual business logic. When it comes to scalability, Pravega uses Isilon or ECS S3 as extended and/or archiving storage.

Although both SDP and Kafka act as a vehicle between the event sender and the event processor, they implement this transport differently. By design, Kafka implements the pub/sub pattern. It basically broadcasts the event to all interested applications at the same time. Pravega makes specific events available directly to a specific application by implementing a point-to-point pattern. Both Kafka and Pravega claim guaranteed delivery. However, the point-to-point approach supports a more rigid underlying transport. 

Conclusion

Dell Technologies offers two real-time streaming solutions, and it is not a simple task to promote one over the other. Ideally, every customer problem requires an initial analysis on the data source, data format, data size, expected data ingestion frequency, guaranteed delivery requirements, integration requirements, transactional rollback requirements (if applicable), storage requirements, transformation requirements, and data structural complexity. Aggregated results from such analysis may help us suggest a specific solution. 

Dell works with customers to collect as much detailed information as possible about the customer’s streaming use cases. Kafka Stream Processing has an impressive feature that offloads the transformation portion of the analytics of a pipeline engine such as Flink or Spark to its Kafka Stream Processing engine. This could be a great advantage. Meanwhile SDP requires extra scripting efforts outside of the Flink configuration space to provide the same logically equivalent capability. On the other hand, SDP simplifies storage complexities through Pravega native streams-per-segments architecture, while Kafka core storage logic pertains to a messaging layer that requires a dedicated file system. Customers that have IoT device data use cases are concerned with ingestion high frequency rate (number of events per second). Soon, we can use this parameter and provide some benchmarking results of a comparative analysis of ingestion frequency rate performed on our SDP and Confluent Real-Time Streaming solutions.

Acknowledgments

I owe an enormous debt of gratitude to my colleagues Mike Pittaro and Mike King of Dell Technologies. They shared their valuable time to discuss the nuances of the text, guided me to clarify concepts, and made specific recommendations to deliver cohesive content.

Author: Amir Bahmanyari, Advisory Engineer, Dell Technologies Data-Centric Workload & Solutions. Amir joined Dell Technologies Big Data Analytics team in late 2017. He works with Dell Technologies customers to build their Big Data solutions. Amir has a special interest in the field of Artificial Intelligence. He has been active in Artificial and Evolutionary Intelligence work since late 1980’s when he was a Ph.D. candidate student at Wayne State University, Detroit, MI. Amir implemented multiple AI/Computer Vision related solutions for Motion Detection & Analysis. His special interest in biological and evolutionary intelligence algorithms lead to innovate a neuron model that mimics the data processing behavior in protein structures of Cytoskeletal fibers. Prior to Dell, Amir worked for several startups in the Silicon Valley and as a Big Data Analytics Platform Architect at Walmart Stores, Inc.

Read Full Blog
  • PowerEdge
  • machine learning
  • MLPerf

MLPerf Inference v0.7 Benchmarks on Power Edge R7515 Servers

Nicholas Wakou Nicholas Wakou

Tue, 08 Dec 2020 17:45:45 -0000

|

Read Time: 0 minutes

Introduction

MLPerf (https://mlperf.org) Inference is a benchmark suite for measuring how fast Machine Learning (ML) and Deep Learning (DL) systems can process input inference data and produce results using a trained model. The benchmarks belong to a diversified set of ML use cases that are popular in the industry and provide a standard for hardware platforms to perform ML-specific tasks. Hence, good performance under these benchmarks signifies a hardware setup that is well optimized for real world ML inferencing use cases.

System under Test (SUT)

  • Server – Dell EMC PowerEdge R7515
  • GPU – NVIDIA Tesla T4
  • Framework – TensorRT™ 7.2.0.14

Dell EMC PowerEdge R7515

Table 1   Dell EMC PowerEdge R7515 technical specifications

ComponentDescription

System name

PowerEdge R7515

Status

Commercially available

System type

Data center

Number of nodes

1

Host processor model lane

AMD® EPYC® 7702P

Host processors per node

1

Host processor core count

64

Host processor frequency

2.00 GHz

Host memory capacity

256 GB DDR4, 2933 MHz

Host storage

3.2 TB SSD

Host accelerator

NVIDIA Tesla T4

Accelerators per node

4

NVIDIA Tesla T4

The NVIDIA Tesla T4, based on NVIDIA’s Turing architecture is one of the most widely used AI inference accelerators. The Tesla T4 features NVIDIA Turing Tensor cores which enables it to accelerate all types of neural networks for images, speech, translation, and recommender systems, to name a few. Tesla T4 supports a wide variety of precisions and accelerates all major DL & ML frameworks, including TensorFlow, PyTorch, MXNet, Chainer, and Caffe2.

Table 2   NVIDIA Tesla T4 technical specifications

ComponentDescription

GPU architecture

NVIDIA Turing

NVIDIA Turing Tensor cores

320

NVIDIA CUDA® cores

2,560

Single-precision

8.1 TFLOPS

Mixed-precision (FP16/FP32)

65 TFLOPS

INT8

130 TOPS

INT4

260 TOPS

GPU memory

16 GB GDDR6, 320+ GB/s

ECC

Yes

Interconnect bandwidth

32 GB/s

System interface

X16 PCIe Gen3

Form factor

Low profile PCIe

Thermal solution

Passive

Compute APIs

CUDA, NVIDIA TensorRT™, ONNX

Power

70W

MLPerf Inference v0.7

The MLPerf inference benchmark measures how fast a system can perform ML inference using a trained model with new data that is provided in various deployment scenarios. Table 3 shows seven mature models that are in the official v0.7 release.

Table 3   MLPerf Inference Suite v0.7

ModelReference applicationDataset

resnet50-v1.5

vision/classification and detection

ImageNet (224 x 224)

ssd-mobilenet 300 x 300

vision/classification and detection

COCO (300 x 300)

ssd-resnet34 1200 x 1200

vision/classification and detection

COCO (1200 x 1200)

bert

language

squad-1.1

dlrm

recommendation

Criteo Terabyte

3d-unet

vision/medical imaging

BraTS 2019

rnnt

speech recognition

OpenSLR LibriSpeech Corpus

 The above models serve in various critical inference applications or use cases that are known as “scenarios.” Each scenario requires different metrics and demonstrates performance in a production environment. MLPerf Inference consists of four evaluation scenarios that are shown in Table 4:

  • Single-stream
  • Multi-stream
  • Server
  • Offline

Table 4   Deployment scenarios

ScenarioSample use caseMetrics

Single-stream

Cell phone augmented reality

Latency in ms

Multi-stream

Multiple camera driving assistance

Number of streams

Server

Translation sites

QPS

Offline

Photo sorting

Inputs/s

Results

The units on which Inference is measured are based on samples and queries. A sample is a unit on which inference is run, such as an image or sentence. A query is a set of samples that are issued to an inference system together. For detailed explanation of definitions, rules and constraints of MLPerf Inference see: https://github.com/mlperf/inference_policies/blob/master/inference_rules.adoc#constraints-for-the-closed-division

Default Accuracy refers to a configuration where the model infers samples with at least 99% accuracy. High Accuracy refers to a configuration where the model infers samples with 99.9% accuracy.

For MLPerf Inference v0.7 result submissions, Dell EMC used Offline and Server scenarios as they are more representative of datacenter systems. Offline scenario represents use cases where inference is done as a batch job (for instance using AI for photo sorting), while server scenario represents an interactive inference operation (translation app).

MLPerf Inference results on the PowerEdge R7515

Table 5   PowerEdge R7515 inference results


3D-UNETBERTDLRMResNet50RNNTSSD-ResNet34

System

Scenario

Default accuracy

High accuracy

Default accuracy

High accuracy

Default accuracy

High accuracy

Default accuracy

Default accuracy

Default accuracy

Dell EMC R7515 (4 x T4)

Offline (samples/s)

28

28

1,708

715

126,287

126,287

23,290

5,712

535

Server (queries/s)

 

 

1,249

629

126,514

126,514

21,506

4,096

450

 
Table 5 above shows the raw performance of the R740_T4x4 SUT in samples/s for the offline scenario and queries for the server scenario. Detailed results for this and other configurations can be found at https://mlperf.org/inference-results-0-7/

Figures 1 to 4 below show the inference capabilities of two Dell PowerEdge servers; R7515 and PowerEdge R7525. They are both 2U and are powered by AMD processors. The R7515 is single socket, and the R7525 is dual socket. The R7515 used 4xNVIDIA Tesla T4 GPUs while the R7525 used four different configurations of three NVIDIA GPU accelerators; Tesla T4, Quadro RTX8000, and A100. Each bar graph indicates the relative performance of inference operations that are completed in a set amount of time while bounded by latency constraints. The higher the bar graph, the higher the inference capability of the platform.

 

Figure 1   Offline scenario relative performance with default accuracy for six different benchmarks and five different configurations using R7515_T4x4 as a baseline

  

 

Figure 2   Offline scenario relative performance with high accuracy for six different benchmarks and five different configurations using R7515_T4x4 as a baseline


 

Figure 3   Server scenario relative performance with default accuracy for five different benchmarks and five different configurations using R7515T4x4 as a baseline

 

 

Figure 4   Server scenario relative performance with high accuracy for two different benchmarks and five different configurations using R7515_T4x4 as a baseline


Figure 5   Relative cost of GPU card configurations using R7515_T4x4 as baseline and its BERT default Performance


Figure 5 shows the relative price of each GPU configuration using the cost of Tesla T4 as the baseline and the corresponding price performance. The price/performance shown is an estimate to illustrate the “bang “for the money that is spent on the GPU configurations and should not be taken as the price/performance of the entire SUT. In this case, the shorter the bar the better.

Key Takeaways from the results

  1. Performance is almost linearly proportional to the number of GPU cards. Checkout figures 1 to 4 and compare the performance of the R7515_T4x4 and R7525_T4x8 or R7525_A100x2 and R7525_A100x3.
  2. Performance significantly tracks the number of GPU cards. The Relative performance of the R7525_T4x8 is about 2.0 for most benchmarks. It has twice the number of GPUs than the reference system. The number of GPUs have a significant impact on performance.
  3. The more expensive GPUs provide better price/performance. From figure 5, the cost of the R7525_A100x3 configuration is 3x the cost of the reference configuration R7515_T4x4 but its relative price/performance is 0.61.
  4. The price of the RTX8000 is 2.22x of the price of the Tesla T4 as searched from the Dell website. The RTX8000 can be used with fewer GPU cards, 3 compared to 8xT4, at a lower cost. From Figure 5, the R7525_RTX8000x3 is 0.8333 x the cost of the R7525_T4x8, and it posts better price/performance and performance.
  5. Generally, Dell Technologies provides server configurations with the flexibility to deploy customer inference workloads on systems that match their requirements:
    1. The NVIDIA T4 is a low profile, lower power GPU option that is widely deployed for inference due to its superior power efficiency and economic value.
    2. With 48 GB of GDDR6 memory, the NVIDIA Quadro RTX 8000 is designed to work with memory intensive workloads like creating the most complex models, building massive architectural datasets and visualizing immense data science workloads. Dell is the only vendor that submitted results using NVIDIA Quadro RTX GPUs.
    3. NVIDIA A100-PCIe-40G is a powerful platform that is popularly used for training state-of-the-art Deep Learning models. For customers who are not on a budget and have heavy Inference computational requirements, its initial high cost is more than offset by the better price/performance.

Conclusion

As shown in the charts above, Dell EMC PowerEdge R7515 performed well in a wide range of benchmark scenarios. The benchmarks that are discussed in this paper included diverse use cases. For instance, image dataset inferencing (Object Detection using SSD-Resnet34 model on COCO dataset), language processing (BERT model used on SQUAD v1.1 for machine comprehension of texts), and recommendation engine (DLRM model with Criteo 1 TB clicks dataset).

Read Full Blog
  • PowerEdge
  • machine learning
  • MLPerf

MLPerf Inference v0.7 Benchmarks on PowerEdge R740 Servers

Nicholas Wakou Nicholas Wakou

Tue, 08 Dec 2020 17:45:45 -0000

|

Read Time: 0 minutes

Introduction

MLPerf (https://mlperf.org) Inference is a benchmark suite for measuring how fast Machine Learning (ML) and Deep Learning (DL) systems can process input inference data and produce results using a trained model. The benchmarks belong to a diversified set of ML use cases that are popular in the industry and provide a standard for hardware platforms to perform ML-specific tasks. Hence, good performance under these benchmarks signifies a hardware setup that is well optimized for real world ML inferencing use cases.

System under Test (SUT)

  • Server – Dell EMC PowerEdge R740
  • GPU – NVIDIA Tesla T4
  • Framework – TensorRT™ 7.2.0.14

Dell EMC PowerEdge R740

Table 1   Dell EMC PowerEdge R740 technical specifications

ComponentDescription

System name

PowerEdge R740

Status

Commercially available

System type

Data center

Number of nodes

1

Host processor model lane

Intel® Xeon® Gold 6248R

Host processors per node

2

Host processor core count

24

Host processor frequency

3.00 GHz

Host memory capacity

384 GB DDR4, 2933 MHz

Host storage

3.84 TB SSD

Host accelerator

NVIDIA Tesla T4

Accelerators per node

4

NVIDIA Tesla T4

The NVIDIA Tesla T4, based on NVIDIA’s Turing architecture is one of the most widely used AI inference accelerators. The Tesla T4 features NVIDIA Turing Tensor cores which enable it to accelerate all types of neural networks for images, speech, translation, and recommender systems, to name a few. Tesla T4 supports a wide variety of precisions and accelerates all major DL & ML frameworks, including TensorFlow, PyTorch, MXNet, Chainer, and Caffe2.

Table 2   NVIDIA Tesla T4 technical specifications

ComponentDescription

GPU architecture

NVIDIA Turing

NVIDIA Turing Tensor cores

320

NVIDIA CUDA® cores

2,560

Single-precision

8.1 TFLOPS

Mixed-precision (FP16/FP32)

65 TFLOPS

INT8

130 TOPS

INT4

260 TOPS

GPU memory

16 GB GDDR6, 320+ GB/s

ECC

Yes

Interconnect bandwidth

32 GB/s

System interface

X16 PCIe Gen3

Form factor

Low-profile PCIe

Thermal solution

Passive

Compute APIs

CUDA, NVIDIA TensorRT™, ONNX

Power

70 W

MLPerf Inference v0.7

The MLPerf inference benchmark measures how fast a system can perform ML inference using a trained model with new data that is provided in various deployment scenarios. Table 3 shows seven mature models that are in the official v0.7 release. 

Table 3   MLPerf Inference Suite v0.7

ModelReference applicationDataset

resnet50-v1.5

vision/classification and detection

ImageNet (224 x 224)

ssd-mobilenet 300 x 300

vision/classification and detection

COCO (300 x 300)

ssd-resnet34 1200 x 1200

vision/classification and detection

COCO (1200 x 1200)

bert

language

squad-1.1

dlrm

recommendation

Criteo Terabyte

3d-unet

vision/medical imaging

BraTS 2019

rnnt

speech recognition

OpenSLR LibriSpeech Corpus

 The above models serve in various critical inference applications or use cases that are known as “scenarios.” Each scenario requires different metrics and demonstrates performance in a production environment. MLPerf Inference consists of four evaluation scenarios that are shown in Table 4:

  • Single-stream
  • Multi-stream
  • Server
  • Offline

Table 4   Deployment scenarios

ScenarioSample use caseMetrics

Single-stream

Cell phone augmented reality

Latency in milliseconds

Multi-stream

Multiple camera driving assistance

Number of streams

Server

Translation sites

QPS

Offline

Photo sorting

Inputs/s

Results

The units on which Inference is measured are based on samples and queries. A sample is a unit on which inference is run, such as an image or sentence. A query is a set of samples that are issued to an inference system together. For detailed explanation of definitions, rules and constraints of MLPerf Inference see: https://github.com/mlperf/inference_policies/blob/master/inference_rules.adoc#constraints-for-the-closed-division

Default Accuracy refers to a configuration where the model infers samples with at least 99% accuracy. High Accuracy refers to a configuration where the model infers samples with 99.9% accuracy. For MLPerf Inference v0.7 result submissions, Dell EMC used Offline and Server scenarios as they are more representative of datacenter systems. Offline scenario represents use cases where inference is done as a batch job (for instance using AI for photo sorting), while server scenario represents an interactive inference operation (translation app).

MLPerf Inference results on the PowerEdge R740

Table 5   PowerEdge R740 inference results



3D-UNETBERTDLRMResNet50RNNTSSD-ResNet34

System

Scenario

Default accuracy

High accuracy

Default accuracy

High accuracy

Default accuracy

High accuracy

Default accuracy

Default accuracy

Default accuracy

Dell EMC R740 (4 x T4)

Offline

(samples/s)

29

29

1,7329

743

13,1571

13,1571

23,844

5,875

546

Server

(queries/s

 

 

1,349

679

126,015

126,015

21,805

4,196

470

Table 5 above shows the raw performance of the R740_T4x4 SUT in samples/s for the offline scenario and queries for the server scenario. Detailed results for this and other configurations can be found at https://mlperf.org/inference-results-0-7/.

Figures 1 and 2 below show the raw data inference performance of the R740_T4x4 SUT for five of the six MLPerf benchmarks that were submitted. Each bar graph indicates the relative performance of inference operations that are completed in a set amount of time while bounded by latency constraints. The higher the bar graph is, the higher the inference capability of the platform. Figure 3 compares offline scenario performance to server scenario and figure 4 compares offline performance using the default and high accuracy. 

 

Figure 1   Default accuracy performance for (BERT,RNNT and SSD) offline and server scenarios

 

Figure 2   Default accuracy performance for DLRM and ResNet50 offline and server scenarios 


Figure 3   Comparing offline to server scenario performance   

 

Figure 4   Comparing offline default accuracy to high accuracy performance    

 

 

Figure 5   Comparing NVIDIA Tesla T4 configurations’ offline performance using R740_T4x4 as a baseline

Figure 5 shows the relative offline performance per GPU card for Tesla T4 configurations from several submitter organizations. 

  

 

 Figure 6   Relative cost of GPU card configurations using R740_T4x4 as baseline and its BERT default Performance

Figure 6 shows the relative price of each GPU configuration using the cost of Tesla T4 as the baseline and the corresponding price performance. The price/performance shown is an estimate to illustrate the “bang“ for the money that is spent on the GPU configurations and should not be taken as the price/performance of the entire SUT. In this case, the shorter the bar the better.

Key takeaways from the results

  1. The R740_T4x4 configuration could successfully perform Inference operations using six different MLPerf benchmarks for the offline scenario and five for the offline scenario. 
  2. Performance is relatively stable across the two datacenter-centric scenarios. Figure 3 shows that the R740_T4x4 inference performance scores for the offline and server scenarios across five different benchmarks are very close. This means that performance will not drastically change due to changes in the type of input stream. 
  3. It is all about accelerators. Figure 5 shows that the relative performance per GPU card of several Tesla T4 configurations is within 4% of each other. These are SUTs with different server platforms from several submitter organizations. 4% is statistically insignificant as it could be attributed to the performance noise level of these systems.  
  4. The more expensive GPUs provide better price/performance. From figure 6, the cost of the R7525_A100x3 configuration is 3x the cost of the reference configuration R740_T4x4 but its relative price/performance is 0.61x.
  5. The price of the RTX8000 is 2.22x of the price of the Tesla T4 as searched from the Dell website. The RTX8000 can be used with fewer GPU cards, three compared to 8xT4, at a lower cost. From Figure 6, the R7525_RTX8000x3 is 0.8333 x the cost of the R7525_T4x8, and it posts better price/performance.
  6. Generally, Dell Technologies provides server configurations with the flexibility to deploy customer inference workloads on systems that match their requirements.
    1. The NVIDIA T4 is a low profile, lower power GPU option that is widely deployed for inference due to its superior power efficiency and economic value.
    2. With 48 GB of GDDR6 memory, the NVIDIA Quadro RTX 8000 is designed to work with memory intensive workloads like creating the most complex models, building massive architectural datasets and visualizing immense data science workloads. Dell is the only vendor that submitted results using NVIDIA Quadro RTX GPUs.
    3. NVIDIA A100-PCIe-40G is a powerful platform that is popularly used for training state-of-the-art Deep Learning models. For customers who are not on a budget and have heavy Inference computational requirements, its initial high cost is more than offset by the better price/performance.

Conclusion

As shown in the charts above, Dell EMC PowerEdge R740 performed well in a wide range of benchmark scenarios. The benchmarks that are discussed in this blog included diverse use cases. For instance, image dataset inferencing (Object Detection using SSD-Resnet34 model on COCO dataset), language processing (BERT model used on SQUAD v1.1 for machine comprehension of texts), and recommendation engine (DLRM model with Criteo 1 TB clicks dataset).

 

Read Full Blog
  • HCI
  • data analytics
  • Elastic Stack

The Case for Elastic Stack on HCI

Keith Quebodeaux Keith Quebodeaux

Tue, 08 Dec 2020 17:45:45 -0000

|

Read Time: 0 minutes

The Elastic Stack, also known as the “ELK Stack”, is a widely used, collection of software products based on open source used for search, analysis, and visualization of data.  The Elastic Stack is useful for a wide range of applications including observability (logging, metrics, APM), security, and general-purpose enterprise search.  Dell Technologies is an Elastic Technology Partner1 This blog covers some basics of hyper-converged infrastructure (HCI), some Elastic Stack fundamentals, and the benefits of deploying Elastic Stack on HCI. 

HCI Overview

HCI integrates the compute and storage resources from a cluster of servers using virtualization software for both CPU and disk resources to deliver flexible, scalable performance and capacity on demand.  The breadth of server offerings in the Dell PowerEdge portfolio gives system architects many options for designing the right blend of compute and storage resources.  Local resources from each server in the cluster are combined to create virtual pools of compute and storage with multiple performance tiers.

VxFlex is a Dell Technologies developed, hypervisor agnostic, HCI platform integrated with high-performance, software-defined block storage.  VxFlex OS is the software that creates a server and IP-based SAN from direct-attached storage as an alternative to a traditional SAN infrastructure.  Dell Technologies also offers the VxRail HCI platform for VMware-centric environments.   VxRail is the only fully integrated, pre-configured, and pre-tested VMware HCI system powered with VMware vSAN.  We show below why both HCI offerings are highly efficient and effective platforms for a truly scalable Elastic Stack deployment.

Elastic Stack Overview

The Elastic Stack is a collection of four open-source projects: Elasticsearch, Logstash, Kibana, and Beats.  Elasticsearch is an open-source, distributed, scalable, enterprise-grade search engine based on Lucene.  Elasticsearch is an end-to-end solution for searching, analyzing, and visualizing machine data from diverse source formats. With the Elastic Stack, organizations can collect data from across the enterprise, normalize the format, and enrich the data as desired.  Platforms designed for scale-out performance running the Elastic Stack provides the ability to analyze and correlate data in near real-time.

Elastic Stack on HCI

In March 2020, Dell Technologies validated the Elastic Stack running on our VxFlex family of HCI2.  It will be shown how the features of HCI provide distinct benefits and cost savings as an integrated solution for the Elastic Stack.  The Elastic Stack, and Elasticsearch specifically, is designed for scale-out.   Data nodes can be added to an Elasticsearch cluster to provide additional compute and storage resources.   HCI also uses a scale-out deployment model that allows for easy, seamless scalability horizontally by adding additional nodes to the cluster(s).  However, unlike bare-metal deployments, HCI also scales vertically by adding resources dynamically to Elasticsearch data nodes or any other Elastic Stack roles through virtualization.  VxFlex admins use their preferred hypervisor and VxFLEX OS and for VxRail it is done with VMware ESX and vSAN.  Additionally, the Elastic Stack can be deployed on Kubernetes clusters, therefor admins can also choose to leverage VMware Tanzu for Kubernetes management.

Virtualization has long been a strategy for achieving more efficient resource utilization and data center density.  Elasticsearch data nodes tend to have average allocations of 8-16 cores and 64GB of RAM.   With the current ability to support up to 112 cores and 6TB of RAM in a single 2RU Dell server, Elasticsearch is an attractive application for virtualization.  Additionally, the Elastic Stack is also significantly more CPU efficient than some alternative products improving the cost-effectiveness of deploying Elastic with VMware or other virtualization technologies.  We would recommend sizing for 1 physical CPU to 1 virtual CPU (vCPU) for Elasticsearch Hot Tier along with the management and control plane resources.  While this is admittedly like the VMware guidance for some similar analytics platforms, these VMs tend to consume a significantly smaller CPU footprint per data node.  The Elastic Stack tends to take advantage of hyperthreading and resource overcommitment more effectively.  While needs will vary by customer use case, our experience shows the efficiencies in the Elastic Stack and Elastic data lifecycle management allow the Elasticsearch Warm Tier, Kibana, and Proxy servers can be supported by 1 physical CPU to 2 vCPUs and the Cold Tier can be upwards of 4 vCPUs to a physical CPU.

Because Elasticsearch tiers data on independent data nodes versus multiple mount points on a single data node or indexer, the multiple types and classes of software-defined storage defined for independent HCI clusters can be easily leveraged between Elasticsearch clusters to address data temperatures.  It should be noted that currently Elastic does not currently recommend any non-block storage (S3, NFS, etc.) as a target for Elasticsearch except as a target for Elasticsearch Snapshot and Restore.  (It is possible to use S3 or NFS on Isilon or ECS as an example as a retrieval target for Logstash, but that is a subject for a later blog.)  For example, vSAN in VxRail provides Optane, NVMe, SSD, and HDD storage options.  A user can deploy their primary Elastic Stack environment with its Hot Elasticsearch data nodes, Kibana, and the Elastic Stack management and control plane on an all-flash VxRail cluster, and then leverage a storage dense hybrid vSAN cluster for Elasticsearch cold data.

Image 1. Example Logical Elastic Stack Architecture on HCI

Software-defined storage in HCI provides native enterprise capabilities including data encryption and data protection.  Because FlexOS and vSAN provide HA via the software-defined storage, Replica Shards in Elastic for data protection are not required.   Elastic will shard an index into 5 shards by default for processing, but Replica Shards for data protection are optional.  Because we have data protection at the storage layer we did not use Replicas in our validation of VxFlex and we saw no impact on performance.

HCI enables customers to expand and efficiently manage the rapid adoption of an Elastic environment with dynamic resource expansion and improved infrastructure management tools.   This allows for the rapid adoption of new use cases and new insights.  HCI reduces datacenter sprawl and associated costs and inefficiencies related to the adoption of Elastic on bare metal.  Ultimately HCI can deliver a turnkey experience that enables our customers to continuously innovate through insights derived by the Elastic Stack.  

References

  1. Elastic Technology and Cloud Partners - https://www.elastic.co/about/partners/technology
  2. Elastic Stack Solution on Dell EMC VxFlex Family - https://www.dellemc.com/en-in/collaterals/unauth/white-papers/products/converged-infrastructure/elastic-on-vxflex.pdf
  3. Elasticsearch Sizing and Capacity Planning Webinar - https://www.elastic.co/webinars/elasticsearch-sizing-and-capacity-planning

About the Author

Keith Quebodeaux is an Advisory Systems Engineer and analytics specialist with Dell Technologies Advanced Technology Solutions (ATS) organization.   He has worked in various capacities with Dell Technologies for over 20 years including managed services, converged and hyper-converged infrastructure, and business applications and analytics.   Keith is a graduate of the University of Oregon and Southern Methodist University.

Acknowledgments

I would like to gratefully acknowledge the input and assistance of Craig G., Rakshith V., and Chidambara S. for their input and review of this blog.  I would like to especially thank Phil H., Principal Engineer with Dell Technologies whose detailed and extensive advice and assistance provided clarity and focus to my meandering evangelism.  Your support was invaluable.  As with anything the faults are all my own.

Read Full Blog
  • data analytics
  • Deloitte

Dell Technologies and Deloitte DataPaaS: Data Platform as a Service

Chris Belsey Chris Belsey

Tue, 08 Dec 2020 17:45:44 -0000

|

Read Time: 0 minutes

The Dell Technologies and Deloitte alliance combines Dell Technologies leading infrastructure software, and services with Deloitte’s ability to deliver solutions, to drive digital transformation for our mutual clients.

DataPaaS enables enterprise deployment and adoption of Deloitte best practice data analytics platforms for use cases such as Financial Services, Cyber Security, Business Analytics, IT Operations and IoT. 

Why choose Dell Technologies and Deloitte

Best-in-class capabilities: The Dell Technologies and Deloitte alliance draws on strengths from each organization with the goal of providing best-in-class technology solutions to customers.

Strong track record of success: For years Dell Technologies and Deloitte have successfully worked together to help solve enterprise customers‘ most complex infrastructure, technology, cloud strategy, and business challenges.

Strategic approach: Successful engagements with a large, diverse group of customers have demonstrated the importance of taking a strategic approach to technology, solution design, integrations, and implementation.

Dell Technologies collaborates with Deloitte to deliver data analytics at scale, allowing customers to focus on outcomes, use cases and value

A screenshot of a cell phone

Description automatically generated

Keeping up with the demands of a growing data platform can be a real challenge. Getting data on-boarded quickly, deploying and scaling infrastructure, and managing users reporting and access demands becomes increasingly difficult. DataPaaS employs Deloitte’s best practise D8 Methodology to orchestrate the deployment, management and adoption of an organisation wide data platform.

  • “Splunk as a Platform” enabling data reuse and analytics across the business
  • On-premise, Cloud or Hybrid – route data to the most cost-effective option or depending on Information Governance policies
  • DataPaaS delivers a catalog of use-cases that can be deployed in minutes…not days or weeks
  • Free up and retain specialist resources - move from troubleshooting and management of the platform, to getting value out of the data in the platform
  • True DevOps, using CICD, spin up and destroy full environments as needed
  • Enforce and maintain consistent configuration, continuously synced enabling simple recovery
  • Data Acquisition Channel for rapid and automated data onboarding and routing
  • DataPaaS enables Data DevOps; 5x faster, at 50% of the cost with 100% control and 8x the return on investment

Find out more

Contact us

A person wearing a suit and tie

Description automatically generated

Asia Pacific region
Stuart Hirst
Partner
Deloitte Risk Advisory Pty Ltd
shirst@deloitte.com.au 
+612 487 471 729
      @convergingdata 

A person wearing a suit and tie

Description automatically generated

United States region
Todd Wingler
Business Development Executive
Deloitte Risk and Financial Advisory
 twingler@deloitte.com
+1 480 232-8540
       @twingler

A person smiling for the camera

Description automatically generated

EMEA region
Nicola Esposito
Partner
Deloitte Cyber
niesposito@deloitte.es
+34 918232431
       @nicolaesposito

A person wearing glasses and looking at the camera

Description automatically generated

Chris Belsey
ISV Strategy & Alliances, Global Alliances
Dell Technologies
chris.belsey@dell.com 
+44 75 0088 0803
       @chrisbelseyemc

A person wearing a suit and tie smiling at the camera

Description automatically generated

Byron Cheng
High Value Workloads Leader, Global Alliances
Dell Technologies
byron.cheng@dell.com
+1 949 241 6328
       @byroncheng1


Read Full Blog
  • data analytics
  • message-oriented middleware

IIoT Analytics Design: How important is MOM (message-oriented middleware)?

Philip Hummel Philip Hummel

Tue, 08 Dec 2020 17:45:45 -0000

|

Read Time: 0 minutes

Originally published on Aug 6, 2018 1:17:46 PM 

Artificial intelligence (AI) is transforming the way businesses compete in today’s marketplace. Whether it’s improving business intelligence, streamlining supply chain or operational efficiencies, or creating new products, services, or capabilities for customers, AI should be a strategic component of any company’s digital transformation.

Deep neural networks have demonstrated astonishing abilities to identify objects, detect fraudulent behaviors, predict trends, recommend products, enable enhanced customer support through chatbots, convert voice to text and translate one language to another, and produce a whole host of other benefits for companies and researchers. They can categorize and summarize images, text, and audio recordings with human-level capability, but to do so they first need to be trained.

Deep learning, the process of training a neural network, can sometimes take days, weeks, or months, and effort and expertise is required to produce a neural network of sufficient quality to trust your business or research decisions on its recommendations. Most successful production systems go through many iterations of training, tuning and testing during development. Distributed deep learning can speed up this process, reducing the total time to tune and test so that your data science team can develop the right model faster, but requires a method to allow aggregation of knowledge between systems.

There are several evolving methods for efficiently implementing distributed deep learning, and the way in which you distribute the training of neural networks depends on your technology environment. Whether your compute environment is container native, high performance computing (HPC), or Hadoop/Spark clusters for Big Data analytics, your time to insight can be accelerated by using distributed deep learning. In this article we are going to explain and compare systems that use a centralized or replicated parameter server approach, a peer-to-peer approach, and finally a hybrid of these two developed specifically for Hadoop distributed big data environments.

Distributed Deep Learning in Container Native Environments

Container native (e.g., Kubernetes, Docker Swarm, OpenShift, etc.) have become the standard for many DevOps environments, where rapid, in-production software updates are the norm and bursts of computation may be shifted to public clouds. Most deep learning frameworks support distributed deep learning for these types of environments using a parameter server-based model that allows multiple processes to look at training data simultaneously, while aggregating knowledge into a single, central model.

The process of performing parameter server-based training starts with specifying the number of workers (processes that will look at training data) and parameter servers (processes that will handle the aggregation of error reduction information, backpropagate those adjustments, and update the workers). Additional parameters servers can act as replicas for improved load balancing.

Parameter server model for distributed deep learning

Worker processes are given a mini-batch of training data to test and evaluate, and upon completion of that mini-batch, report the differences (gradients) between produced and expected output back to the parameter server(s). The parameter server(s) will then handle the training of the network and transmitting copies of the updated model back to the workers to use in the next round.

This model is ideal for container native environments, where parameter server processes and worker processes can be naturally separated. Orchestration systems, such as Kubernetes, allow neural network models to be trained in container native environments using multiple hardware resources to improve training time. Additionally, many deep learning frameworks support parameter server-based distributed training, such as TensorFlow, PyTorch, Caffe2, and Cognitive Toolkit.

Distributed Deep Learning in HPC Environments

High performance computing (HPC) environments are generally built to support the execution of multi-node applications that are developed and executed using the single process, multiple data (SPMD) methodology, where data exchange is performed over high-bandwidth, low-latency networks, such as Mellanox InfiniBand and Intel OPA. These multi-node codes take advantage of these networks through the Message Passing Interface (MPI), which abstracts communications into send/receive and collective constructs.

Deep learning can be distributed with MPI using a communication pattern called Ring-AllReduce. In Ring-AllReduce each process is identical, unlike in the parameter-server model where processes are either workers or servers. The Horovod package by Uber (available for TensorFlow, Keras, and PyTorch) and the mpi_collectives contributions from Baidu (available in TensorFlow) use MPI Ring-AllReduce to exchange loss and gradient information between replicas of the neural network being trained. This peer-based approach means that all nodes in the solution are working to train the network, rather than some nodes acting solely as aggregators/distributors (as in the parameter server model). This can potentially lead to faster model convergence.

Ring-AllReduce model for distributed deep learning

The Dell EMC Ready Solutions for AI, Deep Learning with NVIDIA allows users to take advantage of high-bandwidth Mellanox InfiniBand EDR networking, fast Dell EMC Isilon storage, accelerated compute with NVIDIA V100 GPUs, and optimized TensorFlow, Keras, or Pytorch with Horovod frameworks to help produce insights faster. 

Distributed Deep Learning in Hadoop/Spark Environments

Hadoop and other Big Data platforms achieve extremely high performance for distributed processing but are not designed to support long running, stateful applications. Several approaches exist for executing distributed training under Apache Spark. Yahoo developed TensorFlowOnSpark, accomplishing the goal with an architecture that leveraged Spark for scheduling Tensorflow operations and RDMA for direct tensor communication between servers.

BigDL is a distributed deep learning library for Apache Spark. Unlike Yahoo’s TensorflowOnSpark, BigDL not only enables distributed training - it is designed from the ground up to work on Big Data systems. To enable efficient distributed training BigDL takes a data-parallel approach to training with synchronous mini-batch SGD (Stochastic Gradient Descent). Training data is partitioned into RDD samples and distributed to each worker. Model training is done in an iterative process that first computes gradients locally on each worker by taking advantage of locally stored partitions of the training data and model to perform in memory transformations. Then an AllReduce function schedules workers with tasks to calculate and update weights. Finally, a broadcast syncs the distributed copies of model with updated weights.

BigDL implementation of AllReduce functionality

The Dell EMC Ready Solutions for AI, Machine Learning with Hadoop is configured to allow users to take advantage of the power of distributed deep learning with Intel BigDL and Apache Spark. It supports loading models and weights from other frameworks such as Tensorflow, Caffe and Torch to then be leveraged for training or inferencing. BigDL is a great way for users to quickly begin training neural networks using Apache Spark, widely recognized for how simple it makes data processing.

One more note on Hadoop and Spark environments: The Intel team working on BigDL has built and compiled high-level pipeline APIs, built-in deep learning models, and reference use cases into the Intel Analytics Zoo library. Analytics Zoo is based on BigDL but helps make it even easier to use through these high-level pipeline APIs designed to work with Spark Dataframes and built in models for things like object detection and image classification.

Conclusion

Regardless of whether you preferred server infrastructure is container native, HPC clusters, or Hadoop/Spark-enabled data lakes, distributed deep learning can help your data science team develop neural network models faster. Our Dell EMC Ready Solutions for Artificial Intelligence can work in any of these environments to help jumpstart your business’s AI journey. For more information on the Dell EMC Ready Solutions for Artificial Intelligence, go to dellemc.com/readyforai.


Lucas A. Wilson, Ph.D. is the Chief Data Scientist in Dell EMC's HPC & AI Innovation Lab. (Twitter: @lucasawilson)

Michael Bennett is a Senior Principal Engineer at Dell EMC working on Ready Solutions.

Read Full Blog