Home > Servers > Rack and Tower Servers > Intel > White Papers > Run compute-intensive Apache Hadoop big data workloads faster with Dell EMC PowerEdge R640 servers > Run compute-heavy Hadoop big data workloads more quickly
Big data workloads running on Hadoop framework can generate valuable insights that help organizations predict behaviors or outcomes. Decision-makers and their teams can use the insight to orchestrate transformative initiatives, such as targeted email campaigns that drive sales or customer feedback analysis that improves product quality. Current-generation PowerEdge R640 servers could allow business units and data analysts to work with large data sets more quickly than those in organizations that continue to run big data analysis workloads on previous-generation servers.
To see how the two solutions could handle real-world big data work, we ran three HiBench big data tests on them:
The cluster of current-generation Dell EMC PowerEdge R640 servers powered by 2nd Generation Intel Xeon Scalable processors completed the three workloads more quickly than the previous-generation solution. The current-generation PowerEdge R640 servers ran the LDA workload with a throughput of more than 4 MB per second—more than double the throughput of the previous-generation solution. Processing more data per second could allow more of your business units to access and use the data. The chart below shows the throughput for both solutions in each test.
The current-generation Dell EMC PowerEdge R640 solution powered by 2nd Generation Intel Xeon Scalable processors needed just over 17 minutes to process 4.5 GB of data for the LDA test. Compared to the previous- generation solution, which needed 36 minutes, the solution completed the workload in less than half the time. Not only could you deliver analysis to decision makers more quickly with the PowerEdge R640, but you could use the extra time, for example, to re-run the LDA workload to ensure accuracy. The chart below shows the times to complete all three tests for both solutions.
Intel HiBench 7.1 is a big data benchmark suite for Apache Hadoop. Some tools in the suite are synthetic micro- benchmarks while others are real-world Hadoop applications. The output of the tools can demonstrate a solution’s processing speed, throughput, bandwidth, CPU utilization, data access patterns, and other metrics as they relate to processing big data workloads.
For more information on HiBench, visit https:// github.com/Intel- bigdata/HiBench.
2 "Intel-bigdata/HiBench,” accessed November 5, 2019, https://github.com/Intel-bigdata/HiBench.