TPCx-HS Results on Dell EMC Hardware with AMD Milan Processors
Mon, 19 Apr 2021 17:56:26 -0000
|Read Time: 0 minutes
Introduction
As part of the Dell Technologies AMD Milan launch in March 2021, Dell Technologies published eight results with the Transaction Processing Performance Council (TPC) (www.tpc.org). Six of those results are categorized as Big Data; five as TPCx-HS and one as TPCx-Big Bench. Milan is a code name for AMD EPYC third-generation processors which represent a broad line-up for cloud, enterprise, and high-performance computing workloads. TPC Big Data results are categorized by data sizes that are known as Scale Factor (SF). The results showcase Big Data performance at the low-end SF 1 TB, while at the high end TPCx-HS crosses the “size frontier” with a 100 TB result.
Big Data technologies like Hadoop and Spark have become an important part of the enterprise IT ecosystem. The TPC Express Benchmark™ HS (TPCx-HS) was developed to provide an objective measure of hardware, operating system, and commercial Apache Hadoop File System API compatible software distributions. Also, it provides the industry with verifiable performance, price-performance, and availability metrics. The benchmark models a continuous system availability of 24 hours a day, 7 days a week.
This blog dives deeply into the results, what they mean, and why they are important for Big Data enthusiasts and professionals in Engineering, Marketing, and Sales.
System under test (SUT)
Figure 1: SUT for the SF100 TB result
All the 5 TPCx-HS results used 1 x PowerEdge 7515 for the Master node and either 9 or 16 PowerEdge R6515 nodes for the Worker nodes. The only major hardware configuration difference apart from the number of Worker nodes was the storage size. The SF 100 TB result was run on a SUT with each Worker node having 8 x 3.2 TB NVMe. All other results used 5 x 3.2 TB NVMe.
Hardware Configuration | |||||
Data Size | SF 100 TB | SF 3 TB | SF 1 TB | ||
# nodes | 17 | 17 | 17 | 10 | 10 |
Processor/Cores/Threads (P/C/T) | 17/544/1,088 | 17/544/1,088 | 17/544/1,088 | 10/320/640 | 10/320/640 |
Framework | MapReduce | Spark | |||
Processor | 1 x AMD EPYC 75F3 32-core, 2.95 GHz, 256 MB-L3 | ||||
Memory | 512GB (8x64GB RDIM 3200 MT/s Dual Rank) | ||||
Network (Cluster Connectivity) | 1x Mellanox Dual Port ConnectX-5 100GbE QSFP28 NIC | ||||
Network (Remote Connectivity) | 1 x Broadcom Gigabit Ethernet BCM5720 NIC | ||||
Data Storage (Number of 3.2 TB NVMe) | 8 | 5 | |||
Software Stack | |||||
Operating System | SUSE Linux Enterprise Server 12 SP5 | ||||
Hadoop | Cloudera Private Cloud Base 7.1.4 | ||||
Java | Open JDK 64-Bit Server build 1.8.0_232-cloudera |
Table 1: SUT configuration
Benchmark workload
TPCx-HS was the first Big Data industry-standard benchmark that was designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload that is defined in TeraSuite (TeraGen, TeraSort, TeraValidate) with formal rules for implementation, execution, metric, result verification, publication, and pricing. It can be used to assess a broad range of Big Data Hadoop system topologies, implementation methodologies and systems in a technically rigorous, directly comparable, and vendor-neutral manner. The current TPCx-HS specification can be found on the TPC Documentation Webpage.
The TPC requires that Express benchmarks like TPCx-HS must run the TPC-provided kit in order to publish a compliant TPC Express result. The latest TPCx-HS kit can be downloaded from www.tpc.org/tpcx-hs. The benchmark workload consists of the following modules:
- HSGen generates data at a particular Scale Factor. It is based on TeraGen.
- HSDataCheck checks the validity of the dataset and replication.
- HSSort sorts and orders the data. It is based on TeraSort.
- HSValidate validates the sorted output. It is based on TeraValidate.
The benchmark test is performed in five phases that are run from a TPCx-HS-master script. The phases must run sequentially without any overlaps.
Figure 2: TPCx-HS execution phases
The benchmark test consists of two runs, Run 1 and Run 2, which must follow the run phases shown above. Except for file system cleanup, no activities are allowed between Run 1 and Run 2. The total elapsed runtime T, in seconds is used for the TPCx-HS Performance Metric calculation. The performance run is defined as the run (Run 1 or Run 2) with the lower TPCx-HS Performance Metric. The repeatability run is defined as the run (Run 1 or Run 2) with the higher TPCx-HS Performance Metric. The reported Performance Metric is the TPCx-HS Performance Metric for the performance run.
Scale factors
Scale Factor (SF) is the dataset size in relation to the minimum required size of a test dataset. For TPCx-HS, the test dataset size must be selected from a set of fixed SFs:
1 TB | 3 TB | 10 TB | 30 TB | 100 TB | 300 TB | 1000 TB | 3000 TB | 10,000 TB |
The SF 100 TB result was the first TPCx-HS result to be published at that SF, which was a major milestone for the Industry.
What is measured?
All TPC published results disclose a Primary Metric that consists of a Performance metric, Price/Performance metric, and an availability date. For TPCx-HS:
- Performance Metric (HSph@SF) reflects the throughput of a run (Run 1 or Run 2) at Scale Factor SF. This metric is the elapsed time T for a performance run to perform all the five phases shown in Figure 2.
- Price/Performance Metric ($/HSph@SF) indicates the Total Cost of Ownership P needed to own and sustain the SUT that scored the Performance Metric.
- System availability date is the day all components used in the Performance test will be available to customers as defined in the TPC Pricing specification.
What the metric numbers mean
Generally, the faster the performance run is completed, the higher the performance score. The score is obtained by normalizing the run times as per the formulas shown above. For Price/Performance, the lower the metric score the better. In this case, a higher Performance score achieved on a SUT with a lower Total Cost of Ownership P will show a better price/performance metric.
Results
As part of the Dell Technologies AMD Milan launch, Dell Technologies published five TPCx-HS results on March 04, 2021. These results summarized in Table 2 below show several performance scenarios:
- Data sizes that are scaled from SF 1 TB to SF 100 TB
- Two different frameworks MapReduce and Spark
- Two different cluster sizes 10 nodes and 17 nodes
The common denominator for these results is that all SUTs used the AMD Milan EPYC 75F3 processors.
Data Size | SF 100 TB | SF 3 TB | SF 1 TB | ||
Number of nodes | 17 | 17 | 17 | 10 | 10 |
Framework | MapReduce | Spark | |||
HSGen (s) | 1,604.68 | 62.32 | 33.03 | 47.14 | 41.22 |
HSSort (s) | 5,819.10 | 205.21 | 87.76 | 140.68 | 119.28 |
HSValidate (s) | 791.54 | 36.25 | 22.09 | 26.01 | 15.99 |
Elapsed Run time (s) | 8,225 | 313 | 146 | 218 | 181 |
Data Size | SF 100 TB | SF 3 TB | SF 1 TB | ||
Performance Metric (HSph) | 43.76 | 34.52 | 24.69 | 16.52 | 19.92 |
Total Cost of Ownership (TCO) | $1,344,855 | $1,229,447 | $1,229,447 | $728,080 | $728,080 |
Price/Performance ($/HSph) | $30,733 | $35,616 | $49,795 | $44,073 | $36,550 |
Total Rack Units (TRU) | 20U | 20U | 20U | 13U | 13U |
Performance/TRU | 2.19 | 1.73 | 1.23 | 1.27 | 1.53 |
Table 2: TPCx-HS results published on March 04, 2021
The table also shows that data sizes were scaled from 1 TB to 100 TB using SUTs occupying the same space as measured in Total Rack Units (TRU). Detailed results can be found at http://tpc.org/tpcx-hs/results/tpcxhs_perf_results5.asp?version=2.
Figure 3 below is a scatter chart that shows the relative performance, relative price/performance, and performance/TRU of all the TPCx-HS SF 1 TB results on March 04, 2021. These results are based on three AMD processor generations. The results in red markers are based on the first-generation AMD Naples processor. Orange markers show results that are based on the second-generation processor AMD Rome. Blue markers show results that are based on the most recent third-generation AMD Milan processors. Green markers show results that were from a competitor, CompA.
Figure 3: TPCx-HS SF 1 TB results by processor
The relative performance and price/performance scores use the results of the AMD Naples-based SUT as a reference. All performance results (circle marker) above the dashed line performed better than the reference SUT and those results below performed worse. Conversely, price/performance results (diamond marker) above the dashed line scored worse than those results below the line.
Figure 4 is a similar chart but shows results at SF 3 TB.
Figure 4: TPCx-HS SF 3 TB results by processor
Figure 5 is a bar chart that shows relative performance using the results of the R6415-Naples-17node-MR SUT as the reference. The bars show results that are based on AMD processors: red for AMD Naples; orange for AMD Rome; and blue for AMD Milan. The green bars are for results from competitor CompA.
Figure 5: TPCx-HS SF 1 TB relative performance results
Figure 6 is a chart that shows performance per TRU and color-coded similarly to Figure 5.
Figure 6: Performance/TRU
Key takeaways from the results
- AMD Milan-based SUTs give the best bang for the money based on TPCx-HS results. Figure 5 shows that they performed up to 2.72x better than SUTs based on earlier generation AMD processors and the competition. Figures 3 and 4 show that the price/performance is comparable to that of the reference SUT.
- Data sizes can be scaled without fear of reduction in price/performance. Table 2 shows that price/performance based on Total Cost of Ownership improves remarkably as the data sizes are scaled from SF 1 TB to SF 100 TB.
- It is worth investing in NVMe-based storage. As shown in Table 2, all the results used NVMe-based storage. This configuration enabled them to use the 1U R6515 servers which occupied less rack space without a reduction in computing resources.
- AMD Milan-based SUTs enable reduced data center footprint. Table 2 and Figure 6 show that within the same space, the workload size can be increased by a factor of 100 without loss of efficiency.
- AMD Milan-based SUTs show improved performance efficiency at scale. At SF 100 TB, more data is being processed in a proportionately less time.
Conclusion
With the publication of TPCx-HS results based on AMD Milan processors, Dell Technologies has become the most dominant publisher of TPC Big Data benchmarks. The results show that Dell EMC hardware platforms with third-generation AMD EPYC processors can do more work efficiently in less space, and with more value for the dollar.
Related Blog Posts
Dell Reinforces its TPCx-AI Benchmark Leadership using the 16G PowerEdge R6625 Hardware Platform at SF1000
Wed, 12 Jul 2023 18:52:17 -0000
|Read Time: 0 minutes
Overview
On 06-13-2023, Dell Technologies published a TPCx-AI SF1000 result that was based on an 11 x Dell PowerEdge R6625 hardware platform powered by AMD Genoa processors. As of the publication date, Dell results held number one slots on the Top Performance and Price/Performance tables for TPCx-AI on SF3, SF100, SF300, and SF1000. These results reinforce Dell Technologies’ TPCx-AI benchmark leadership position; a statement to the great performance provided by its AI, ML, and DL solutions.
This blog presents the hardware platform that was tested, what was measured and what the results mean.
What TPCx-AI tests measure
TPCx-AI measures the end-to-end machine learning or data science platform using a diverse representative dataset scaling from 1 GB to 10 TB. The TPCx-AI benchmark assesses various aspects of AI training and inference performance, including data generation, model training, serving, scoring, and system scalability. The benchmark can be used across a wide range of different systems from edge to data center. It aims to provide a standardized and objective measure of AI performance across different platforms and configurations.
By using TPCx-AI, organizations and vendors can make informed decisions about the AI infrastructure that best suits their needs. The benchmark helps in understanding the system's capability to handle large-scale AI training workloads and can help optimize performance and resource allocation for AI tasks.
The TPCx-AI standard defines 10 use cases based on data science pipelines modeled on a retail business data center to evaluate the performance of artificial intelligence systems. The workload trains deep neural networks on large datasets using prominent machine learning frameworks such as TensorFlow. The benchmark measures:
- The total time taken to train a model for each use case to a specific level of accuracy
- The time taken for that model to be used for inference or serving
The blog, Interpreting the results of the TPCx-AI Benchmark, outlines the ten use cases, their data science models, and the benchmark phases.
System under test (SUT)
Figure 1 System Under Test (SUT).
Software versions
Table 1 Software versions
Software | Version |
Cloudera Data Platform (CDP) | 7.1.7 SP2 |
Hadoop | 3.1.1 |
HDFS | 3.1.1 |
YARN | 3.1.1 |
MR2 | 3.1.1 |
Spark | 2.4.7 |
ZooKeeper | 3.5.5 |
Java | 1.8.0 |
Python | 3.7.16 |
Red Hat Enterprise Linux | 8.7 (Master node) |
TPCx-AI Kit | 1.0.2 |
The result
Primary metrics
Table 2 Primary metric scores
Primary Metric | Score |
Performance (AIUCpm@1000) | 3,258.01 |
Price/Performance (USD/AIUCpm@100) | 267.96 |
Availability | June 13, 2023 |
The three primary metrics in Table 2 are required for all TPC results. The top ten results, based on performance or price/performance at a particular SF category, are displayed in the tables of the respective benchmark standard categorized by the metric and SF. To compare any results, all three metrics must be disclosed in the body of the message. The TPC does not allow comparing TPCx-AI results from different SF categories. The blog, Interpreting the results of the TPCx-AI Benchmark, goes into the details of how the performance and price/performance metrics are calculated. The availability date is the date all the priced line items (SKUs) are available to customers and must be within 185 days of the submission date. For the performance metric, the higher the score the better. For price/performance, the lower the better.
Other metrics
Table 3 Other metrics
Metric | Score |
Total system cost | $872,988 |
Framework | Cloudera SEL Data Platform Private Cloud Base Edition |
Operating system | Red Hat Enterprise Linux 8.6/8.7 |
Scale factor | 1,000 |
Physical storage divided by scale factor | 214.56 |
Scale factor divided by physical memory | 0.12 |
Main data redundancy mode | Replication 3, RAID 1 |
Number of servers | 11 |
Total processors, cores, and threads | 22/704/1,344 |
Number of streams | 4 |
The metrics in Table 3 are required to be reported and disclosed in the Full Disclosure Report (FDR) and Executive Summary (ES). Except for the total system cost, these other metrics are not used in the calculation of the primary metrics but provide additional information about the system that was tested. For instance, the total system cost is the total cost of ownership (TCO) for one year. The redundancy modes provide the data protection mechanisms that were used in the configuration as required by the benchmark standard. The number of streams refers to the number of concurrent serving tests during the Throughput phase.
Numerical quantities
Benchmark run times
Table 4 Benchmark run times
Benchmark run | Time |
Benchmark start | 06-07-2023 9:35:25 PM |
Benchmark end | 06-08-2023 3:20:10 AM |
Benchmark duration | 5:44:45.193 |
Benchmark phase times
Table 5 Benchmark phase metrics
Benchmark phase | Metric_name | Metric value |
Data Generation | DATAGEN | 2419.613 |
Data Loading | TLOAD | 927.45 |
Load Test | TLD | 927.45 |
Power Training | TPTT | 492.143 |
Power Serving 1 | TPST1 | 56.998 |
Power Serving 2 | TPST2 | 57.357 |
Power Serving | TPST | 57.357 |
Throughput | TTT | 43.934 |
AIUCpm@1000.0 | 3258.066 |
The seven benchmark phases and their metrics are explained in Interpreting the results of the TPCx-AI Benchmark, and are performed sequentially from data generation to throughput tests. In power training, models are generated and trained for each use case sequentially from UC1 to UC10. In power serving, the models obtained during the training phase are used to conduct the serving phase sequentially, one use case at a time. There are two power serving tests. The test that registers the longer time provides the TPST metric. The throughput phase runs multiple streams of serving tests concurrently. The more the number of streams, the more the system resources are taxed. Typically, the number of streams are increased until TTTn+1 > TTTn (where n+1 refers to the next throughput test). The duration of the longest running stream (TTPUT) is used to calculate the throughput test metric TTT.
Use case times and accuracy
Table 6 Use case times and accuracy
Use case | TRAINING | SERVING_1 | SERVING_2 | Throughput | Accuracy | Threshold |
1 | 523.703 | 51.215 | 49.736 | 56.083 | -1.00000 | -1.0 >= -1 |
2 | 1813.764 | 85.354 | 88.783 | 129.274 | 0.43830 | word_error rate <= 0.5 |
3 | 95.795 | 12.443 | 12.811 | 13.84 | 4.57451 | mean_squared_log_error <= 5.4 |
4 | 59.08 | 25.475 | 25.489 | 31.016 | 0.71189 | f1_score >= 0.65 |
5 | 943.023 | 76.289 | 78.351 | 91.615 | 0.03347 | mean_squared_log_error <= 5.4 <= 0.5 |
6 | 435.865 | 33.135 | 33.071 | 37.12 | 0.21355 | matthews_corrcoef >= 0.19 |
7 | 43.585 | 15.317 | 15.3 | 17.143 | 1.65306 | median_absolute_error <= 1.8 |
8 | 1940.283 | 338.579 | 341.811 | 372.418 | 0.74996 | accuracy_score >= 0.65 |
9 | 5448.735 | 703.291 | 699.631 | 745.458 | 1.00000 | accuracy_score >= 0.9 |
10 | 818.635 | 28.326 | 28.19 | 31.162 | 0.81691 | accuracy_score >= 0.7 |
Table 6 shows the use case run times (in seconds) for each benchmark phase and the accuracy of the model that was used. For instance, the RNN model that was generated and trained for UC2 had a word_error rate of 0.4383 which was less (better) than the threshold error_rate of 0.5. The XGBoost model trained for UC8 was 74.99% accurate which was above and better than the 65% minimum accuracy threshold requirement.
Figure 2 Use case time by benchmark phase
TPCx-AI SF1000 results tables
Table 7 displays the top TPCx-AI SF1000 tables as of the publication of this blog.
Table 7 SF1000 top performance table
Table 8 Top price/performance table
Table 7 and Table 8 are similar. Of the four published results at SF1000, Dell Technologies’ hardware platforms hold the number 1, number 2, and number 3 positions on both the performance and price/performance tables. The main difference between the three top results is the processor generations:
- The number 1 result used 4th generation AMD Genoa processors
- The number 2 result used 3rd generation Intel Ice Lake processors
- The number 3 result used 2nd generation Intel Cascade Lake processors
Key takeaways
- Dell dominates TPCx-AI top performance and price/performance tables at SF3, SF100, SF300, and SF1000.
- TPCx-AI performance improved greatly on newer generation Dell hardware platforms that have newer generation processors:
- There was a 60.71% performance improvement between hardware platforms powered by (14G) 2nd generation and (15G) 3rd generation processors.
- There was a 37.13% improvement between 3rd generation and (16G) 4th generation processors.
- TPCx-AI price/performance improved greatly between processor generations of the Dell 14G, 15G, and 16G hardware platforms:
- There was a 14.80% price/performance drop from hardware platforms powered by 2nd generation to 3rd generation processors.
- There was a 27.08% price/performance drop from 3rd generation to 4th generation processors.
- The form factor of the hardware platforms has reduced:
- The Dell 14G TPCx-AI SF1000 result used 2U servers
- The 15G and 16G results used 1U servers and scored better performance and price/performance
- Using NVMe data storage scored better price/performance metrics:
- The 14G result used hard drives
- The 15G and 16G results used more expensive NVMe data drives, and yet scored better price/performance metrics
Conclusion
This blog examined in detail the TPCx-AI performance result of the Dell 16G PE R6625 hardware platform. The result cemented Dell Technologies’ leadership positions on TPCx-AI performance and price/performance tables at SF1000, in addition to the leadership positions at SF3, SF100, and SF300. These results prove Dell Technologies’ leadership as a provider of high-performance AI, ML, and DL solutions based on verifiable performance data backed by a reputable, industry-standards performance consortium.
References
Nicholas Wakou, Nirmala Sundararajan; Interpreting the results of the TPCx-AI Benchmark; infohub.delltechnologies.com (February 2023).
TPCx-Big Bench Rocks on Dell EMC PowerEdge R7515 with AMD Milan Processors
Thu, 06 May 2021 11:05:30 -0000
|Read Time: 0 minutes
Introduction
As part of the Dell Technologies AMD Milan launch in March 2021, Dell Technologies published eight results with the Transaction Processing Performance Council (TPC) (www.tpc.org). Six of those results are categorized as big data; five as TPCx-HS, and one as TPCx-Big Bench. Milan is a code name for AMD EPYC third-generation processors which represent a broad line-up of processors for cloud, enterprise, and high-performance computing workloads. This blog is part of a series that present these results. It also dives into why they should matter to Big Data enthusiasts and professionals in Engineering, Marketing, and Sales.
System under test (SUT)
Figure 1: SUT for TPCx-BB result
The SUT for the TPCx-BB result used 11 x Dell EMC PowerEdge R7515 servers: one NameNode server and 10 DataNode servers as shown in the Figure 1 above.
Hardware configuration | |
Data Size | SF 3000 |
Number of nodes | 11 |
Processor | 1 x AMD EPYC 7763 64-core, 2.45 GHz, 256 MB-L3 |
Memory | 512 GB (8 x 64 GB RDIM 3200 MT/s Dual Rank) |
Network (cluster connectivity) | 1x Broadcom Dual Port 25 GbE NIC Mezzanine |
Network (remote connectivity) | 1 x Broadcom Gigabit Ethernet BCM5720 NIC |
Software stack | |
Operating system | SUSE Linux Enterprise Server 12 SP5 |
Hadoop | Cloudera Private Cloud Base 7.1.4 |
Query execution engine | Hive on Tez |
Java | Open JDK 64-Bit Server build 1.8.0_232-cloudera |
Table 1: SUT configuration
TPCx-BB Overview
TPCx-Big Bench (TPCx-BB) is an application benchmark for Big Data Analytic Systems (BDAS). Three cornerstone aspects characterize Big Data systems: volume, velocity, and variety.
Volume refers to the size of the Big Bench dataset that is based on a single scale factor and is predictable and deterministic. Scale Factors are used to scale data from 1 TB to up to Petabytes of data. Velocity refers to the ability of the Big Data system to stay current through periodic refreshes, commonly known as Extraction, Transformation, and Load (ETL). Variety refers to the ability to deal with differently organized data, from unstructured to semi-structured and structured data.
TPCx-BB features 30 complex queries; query 1 to query 30. These queries are real-world and are designed along one business dimension and three technical dimensions that cover different business cases and technical perspectives: Data Source, Processing type, and Analytic technique.
Based on the McKinsey report (McKinsey Report on Big Data; Big Data: The next frontier for innovation, competition, and productivity; https://www.mckinsey.com/~/media/McKinsey/Business%20Functions/McKinsey%20Digital/Our%20Insights/Big%20data%20The%20next%20frontier%20for%20innovation/MGI_big_data_full_report.pdf ) on big data, 10 (query 1 – query 10) queries were identified that fall into five main categories of a retail business: Marketing, Merchandising, Operations, Supply Chain and New business models (price comparisons).
Data source dimension measures the type of input data the query is targeting. There are three types of input data in Big Bench: structured, semi-structured and unstructured. For example, Query 1 uses semi-structured web click streams as data source, while Query 3 performs sentiment words extraction on unstructured product reviews data.
Processing type dimension measures the type of processing appropriate for the query. This dimension covers the two common paradigms of declarative and procedural languages. In other words, some of the queries can be answered by declarative languages; others by procedural languages; and others by a mix of both.
Analytic technique dimension measures different techniques for answering business analytics questions. In general, three major categories of analytic techniques were identified: statistical analysis, data mining, and simple reporting.
The TPC requires that Express benchmarks like TPCx-BB must run the TPC-provided kit in order to publish a compliant TPC Express result. The latest TPCx-BB kit can be downloaded from TPC Documentation Webpage.
Scale Factor
TPCx-BB defines a set of discrete scaling points (scale factors) based on the approximate size of the raw data that the data generator produces, in GB. Each defined scale factor has an associated value for SF, a unit-less quantity, roughly equivalent to the number of GB of data present on the storage. Test sponsors may choose any scale factor from the defined series except SF1 which is used for Result validation only. No other scale factors may be used for a TPCx-BB Result.
1 | 1000 | 3000 | 10000 | 30000 | 100000 | 300000 | 1000000 |
Table 2: Allowable scale factors
Dell Technologies published on SF 3000 which is equivalent to 2,794 GB of raw data.
Benchmark Phases
TPCx-BB defines three phases: Load test, Power test, and Throughput test. The three tests run sequentially and are not permitted to overlap.
During the load test, the test database that is used to run the three phases is built. The Power test determines the time the SUT can process all 30 queries which must run sequentially in ascending order. The Throughput test runs 30 queries using concurrent streams. Each stream runs all 30 queries in a specified placement order. The default number of streams is set to 2, but the number of concurrent streams is configurable with no maximum limit.
The results must be run as per TPCx-BB specification in order to pass an audit. Then the TPC publishes them. A compliant benchmark test consists of a validation test followed by two benchmark runs; Run 1 and Run 2. A benchmark run consists of three phases as stated above: Load test, Power test, and Throughput test. The validation test performs the three benchmark phases with Scale Factor 1 and validates the results against the reference result set in the kit. The validation test ensures that the engine that the test sponsor uses can match the reference result set generated.
Figure 2: Benchmark execution phases
What is measured?
The benchmark measures the time that it takes for all 30 queries to be performed. All TPC published results must disclose a Primary Metric that consists of three elements: Performance metric, Price/Performance metric, and an availability date.
The performance metric is computed from metric components representing Load, Power, and Throughput tests as defined above.
SF | Scale Factor |
|
TLoad | Elapsed time of Load test |
|
TLD | Load Factor | TLD=0.1 * TLoad |
Q(i) | Elapsed time in seconds of Query i |
|
M | Number of Queries |
|
TPT | Geometric Mean of the elapsed time of each of the 30 Queries measured in the power run |
|
TTput | Elapsed time of all streams in the Throughput test |
|
TTT | Throughput test metric | |
n | Number of streams in the Throughput Test |
|
Performance Metric |
|
Table 3: Computation of the performance metric
The price/performance metric is computed as below:
The system availability date is the date all components of the SUT are generally available to customers. Any reference to a TPC result must disclose all the 3 elements of the Primary Metric.
TPCx-BB consists of application level workloads that essentially measure the efficiency of the underlying infrastructure. A good result will depend on a well optimized and tuned infrastructure, from the BIOS/OS settings through the Hadoop framework, to the application level (Hive, Spark). For this result, Cloudera Private Cloud Base (CDP 7.1.4) configuration settings were optimized (Dell EMC PowerEdge 14G Performance Characterization for Data Analytics; https://infohub.delltechnologies.com/section-assets/h17247-poweredge-14g-performance-characterization-for-data-analytics-technical-white-paper) based on the resources (CPU cores, Memory and Storage ) available to the cluster. Hive (SQL Engine) settings for each query were also tweaked for improved query performance. Additionally, Spark Submit operator settings were adjusted for better performance of the five Machine Learning queries (q5, q20, q25,q26 and q28).
Results
The artifacts for this result can be found on this TPC Results Page. Table 4 below summarizes two results of the TPCx-BB SF3000 performance table: the most recent result based on Dell EMC PowerEdge 7515 and the previous best result (PrevBest) from a competing company. The PrevBest result was submitted four years ago and is now historical.
A historical result is one that:
- Is still in an accepted state
- Has been posted:
- At least 185 days past the submission date
- At least 60 days past the availability date.
On the TPCx-BB Performance results page, historical results are not displayed by default unless the Include Historical Results option is checked. This result (PrevBest) was the previous best result. It is added to show how far technology has moved and improved Big Data performance in the last four years.
| Run 1 | Run 2 | PrevBest (PrevBest is a historical result.) |
Load test (s) | 473.10 | 472.69 | 1,302.23 |
Power test (s) | 5,612.53 | 5,630.56 | 16,294.45 |
Throughput test (s) | 24,907.11 | 24,792.50 | 52,751.28 |
Overall run time | 30,992.74 (8.61 hrs.) | 30,895.76 | 70,347.97 (19.54 hrs.) |
Performance (BBQpm) | 1,544.13 | 1,547.29 | 611.31 |
Price/Performance ($/BBQpm) | 487.85 |
| 646.31 |
Availability date | 03-15-2021 |
| 12-29-2016 |
Number of nodes | 11 |
| 16 |
Total rack units (TRU) | 23 |
| 34 |
Processor/Cores/Threads (P/C/T) | 11/704/1,408 |
| 32/512/1,024 |
Storage | 54,240GB/30xNVMe + 24 SSD |
| 220,480GB/ 254 HDD |
Table 4: TPCx-BB SF3000 performance results on March 15, 2021
Figure 2 above shows the benchmark phases. Run 1 and Run 2 are run sequentially on the same SUT. The run with the smallest BBQpm score, in this case Run1, is designated as the Performance run. The run with the lowest score (Run 2) is the Repeatability run.
Figure 3 below is a chart showing that the Dell EMC PowerEdge 7515 result scored 2.53x better performance than the previous best result. PrevBest* is a historical result.
Fig 3: TPCx-BB SF 3000 performance result on March 15, 2021
Figure 4 is a chart that shows that price/performance improved by 24.52%.
Fig 4: TPCx-BB SF 3000 price/performance result on March 15, 2021
Key Takeaways
- Dell Technologies renews its interest in the smaller SF TPCx-BB space after over four years.
- More processing power is contained in smaller packages. The Dell EMC PowerEdge R7515 result used more processing cores in a smaller footprint (TRU) compared to the previous best result. This tendency is consistent with current industry trends.
- It pays to invest in faster storage. The Dell EMC PowerEdge R7515 result used about 54 TB of NVMe + SSD storage compared to about 220 TB of HDD storage. The faster storage better matched the faster processors enabling more efficient processing. That power contributed to the SUT posting 2.53x better performance and 24.52% better price/performance than the previous best TPCx-BB SF 3000 result. The SUT used in this result took advantage of the lower prices for NVMe/SSD storage devices.
- Real-world queries should run over 2x faster on the PowerEdge R7515, based on previous results. Table 4 shows that the Dell EMC PowerEdge R7515 result was performed in 8.61 hours compared to 19.54 hours used by the previous best result.
Conclusion
This result demonstrates that Dell Technologies has a renewed interest in smaller SFs after several years. The previous most recent results from Dell EMC have been on the bigger SF 10000.
The TPCx-BB SF 3000 result showed that Dell EMC PowerEdge R7515 hardware platforms with AMD Milan processors pack blazing performance in smaller (than predecessors) rack units. These servers enable smaller data center footprints without sacrificing price or performance. This advantage coincides with observations from the other results which are part of this blog series.
Dell Technologies uses these results to provide verifiable performance data about its products and solutions to its customers. For that reason, Dell Technologies has been an active member of the TPCx-Big Bench Technical Committee. Dell Technologies continues to collaborate with other stakeholders within the industry to maintain the TPCx-BB specification.