Speeding time to insight: The Dell PowerEdge C6620 with Dell PERC 12 RAID controller for Apache Cassandra big
Read the ReportThu, 21 Sep 2023 22:56:22 -0000
|Read Time: 0 minutes
The new PowerEdge C6620 delivered better performance—both higher throughput and lower latency—than a previous-generation PowerEdge C6520 with PERC 11
Overview
Every day, individuals and organizations generate massive quantities of data, from text messages to location data to information from sensors on factory floors and beyond. This rapid proliferation of data offers enormous opportunities: If businesses can extract insights from that data, they can use it to improve their operations, grow their customer base, and provide a better experience to those customers. That task is not simple, however. Much of this data is unstructured, meaning that it comes in many formats that traditional data models, such as SQL databases, cannot process. Processing and analyzing unstructured data may require different methods, such as utilizing a NoSQL database like Apache® Cassandra®. Organizations can use NoSQL databases to store, mine, and analyze unstructured data in its many forms and gain actionable information. To efficiently analyze such large quantities of data, however, they need a powerful computing solution running the database system. Investing in newer server solutions with updated processing, storage, and networking components can offer greater performance and enable companies to get to those vital insights faster. To highlight the advantages of moving from an older server solution to a new one for big data workloads, we tested Apache Cassandra performance on a new Dell™ PowerEdge™ C6620 with a Broadcom®-based Dell PowerEdge RAID Controller (PERC) 12 and an older Dell PowerEdge C6520 with Dell PERC 11. On multiple performance metrics, the newer Dell PowerEdge C6620 with PERC 12 delivered stronger performance than its predecessor, offering businesses the chance to increase the value of their data and realize its benefits more quickly.
About the Dell PowerEdge C6620 server
Part of the Dell modular infrastructure PowerEdge C-Series, Dell says the PowerEdge C6620 is “designed for compute-intensive workloads” but also “ideal for IOPS-heavy workloads.”1 It features up to two 4th Generation Intel® Xeon® Scalable processors, with up to 56 cores per processor; offers memory speeds of up to 4,800 MT/s; and supports up to 16 NVMe® drives for workload acceleration. Optional liquid cooling is also available. visit https://www.dell.com/en-us/shop/ enterprise-products/c6620-two-socket-server-node-intel/spd/poweredge-c6620.
Testing the Dell PowerEdge C6620 with Broadcom-based PERC 12
If you’re still relying on servers you purchased several years ago, it can be helpful to understand exactly how much you could gain by upgrading to a newer solution. We designed our testing to quantify the benefits of upgrading from older to latest-generation servers for organizations relying on Cassandra workloads for critical operations.
Our configurations
To set up our test environment, we installed VMware® vSphere® 8 on both servers. We then configured a separate infrastructure server with VMware ESXi™ and VMware vCenter® to manage the servers and to host client VMs that ran our test workload against our databases. The Dell PowerEdge C6620 server with Broadcom-based PERC 12 used two Dell U2 Gen4 NVMe® 3.84TB drives, while the Dell PowerEdge C6520 server with PERC 11 used six 960GB mixed-use SAS 12Gbps SFF drives. (See Table 1 for more details of our configuration.)
Table 1: System configurations we used in our testing. Source: Principled Technologies.
Server configuration information | Dell PowerEdge C6520 | Dell PowerEdge C6620 |
Processors | 2x Intel Xeon Gold 6330 28 cores, 2GHz | 2x Intel Xeon Platinum 8452Y 36 cores, 2GHz |
Storage controller | PERC H750 Adapter, 8GB cache | PERC H965i Adapter, 8GB cache |
Disks | 6x 960GB Toshiba PX05SVB096Y (12Gb SAS SSDs) | 2x 3.84TB Dell Enterprise NVMe v2 AGN RI U.2 (NVMe SSDs) |
Total memory in system (GB) | 512 | |
OS and version number | VMware ESXi 8.0.0, 20513097 |
On each server, we created a Cassandra gold VM and cloned it five times to create a total of six VMs, which we joined in a cluster configuration. We then used the Yahoo Cloud Serving Benchmark (YCSB) to create a 100GB database across the six VMs to take advantage of the distributed database functionality of Cassandra, ran YCSB workload B for 30 minutes, and recorded the results. In the results we highlight below, we provide two perspectives on the performance of each setup: the total throughput and the average read and write latency. Both results reflect the performance across all six VMs.
Why YCSB?
YCSB is an industry-standard benchmark for NoSQL databases. In 2010, a group from Yahoo! Research created it with “the goal of facilitating performance comparisons of the new generation of cloud data serving systems.”2 It is open source, meaning that anyone can access and modify the source code. In a recent interview, contributors to the YCSB open-source community note that it “is rather largely accepted by users” and “represents a series of scenarios that can be abstracted from the real world.”3 Apache Cassandra was one of the first four databases that the YCSB creators tested with the benchmark in 2010, and YCSB remains a good fit for testing Cassandra performance today.4
YCSB functions by letting users create a database populated with synthetic data on their database system of choice. Users can then run a pre-defined or customized workload against the database to gauge system performance. YCSB offers six core workloads, each of which represents a different type of database work. Our testing used the read-intensive workload B. This workload is 95 percent reads (pulling data from a database) and 5 percent writes (adding to or changing data in a database). YCSB gives one application example as photo tagging, where a user might occasionally add a tag to a photo, (write) but will mostly search a library of tagged photos (read).5 A solution that offers higher performance on YCSB workload B is likely to improve performance on other read-intensive workloads, such as data analysis. We chose this workload to focus on reading and analyzing a database.
See higher throughput and lower latency with the Dell PowerEdge C6620 with Broadcom-based PERC 12
Our testing with YCSB yielded three metrics: read latency, update (or write) latency, and throughput (measured in operations per second). The Dell PowerEdge C6620 with Broadcom-based PERC 12 offered stronger performance than the PowerEdge C6520 with PERC 11 on all three metrics, indicating that an upgrade can help speed your Cassandra workloads.
On the first and second metrics, read latency and update latency, the Dell PowerEdge C6620 was significantly faster than its previous-generation counterpart. Read latency measures the delay between the application requesting a piece of data and the database system delivering it; update latency measures the delay between the application changing or adding a piece of data and the database system completing the action. The shorter these delays, the faster a solution will be at completing user-facing requests, such as retrieving a customer’s buying history when a store manager searches for it, and larger workloads, such as running analysis on a set of tens of thousands of data points.
On the surface, the differences in latency between the two solutions are very small: 0.49 milliseconds for read latency and 0.57 milliseconds for update latency. On a single operation, a delay of less than a millisecond would be impossible for a human to notice. But the database system isn’t handling just one operation—it’s handling thousands or millions of operations all at once. Our YCSB testing, for example, set the maxexecutiontime variable (or how long the benchmark should run) to 30 minutes. This means that at the Dell PowerEdge C6620 server’s rate of 249,210 operations per second (which we show in Figure 3), it executed over 400 million operations during the 30-minute test. As tiny differences in latency scales up, they become very significant indeed. And the shorter these delays, the faster a solution will be at completing both user-facing requests, such as retrieving a customer’s buying history when a store manager searches for it, and larger workloads, such as running analysis on a set of tens of thousands of data points.
About the Dell PERC 12 RAID controllerThe Dell PowerEdge C6620 we tested features the PERC 12, which offers a single front controller with full RAID support for both NVMe and SAS.6 It brings 3,200MHz cache memory speed and a 16-lane host bus type and supports RAID levels 0, 1, 5, 6, 10, 50, and 60.7
The Dell PERC 12 is based on the Broadcom SAS4116W series chip. According to Broadcom, “this eighth- generation SAS RAID-on-Chip (ROC) is based on the industry-leading Fusion-MPT architecture and features Tri-Mode SerDes technology that enables a seamless operation of up to 16-wide direct-connect NVMe, SAS or SATA storage devices from any system design.…The Tri-Mode ROC device with 16-wide PCIe Gen 4.0 lanes provides SAS data transfer rates of 22.5, 12, 6Gb/s per lane and 6Gb/s SATA data transfer rates per lane. The high-port count ROC helps eliminate storage bottlenecks with support of x8, x4, x2, and x1 PCI Express® lanes and complies with the PCIe 4.0 specification, offering up to 6 million IOPS (random reads) and up to 900,000 IOPS in RAID (random writes).”8 To learn more about the Dell PERC 12, visit https://infohub.delltechnologies.com/p/ dell-poweredge-raid-controller-12/ |
On the third metric, throughput, the Dell PowerEdge C6620 delivered 1.25 times as many operations per second as the previous-generation PowerEdge C6520. This increase in throughput is what we would expect to see based on the lower latencies: If a system is able to process operations faster (i.e., with lower latency), it will also boost how many operations the system can handle in a given time (i.e., better throughput). With greater throughput, depending on what read-intensive workloads your organization is running, you might see faster video streaming, quicker recommendations for customers, or an increase in the speed of users pulling up data.
NoSQL databases and Cassandra in today’s business landscape
For this study, we tested with Apache Cassandra, a widely used NoSQL database system. NoSQL, or non-relational, databases are a category of database system that store and query data that do not have a traditional data structure. Traditional SQL databases organize data in a column-row format for finding or creating relationships across the data. To store data in a SQL database, all data in each table must have the same structure and fit a pre-defined schema, with every row in each table including the same columns and formats every time. NoSQL databases, however, can organize data more dynamically. They can deal with data from documents, graphs, key-values, and more. This flexibility lets people use them to analyze documents or data that don’t follow identical structuring formats. For organizations that need to store and analyze unstructured data—which may include data from Internet of Things (IoT) applications, audio, video, text files, social media posts, and more—a NoSQL database is a great option.
There are many types of NoSQL database systems; Apache Cassandra is a type of key-value and wide- column store. These databases have essentially two fields: One is the key, and the other is the value. The value can be any type of data (text, numbers, etc.). Taking our previous example, a key-value database could have some keys that correspond to a date, others that are numbers, and so on. A wide-column database, which Cassandra uses, is a two-dimensional key-value database, where instead of mapping to just one value, the keys can map to several columns of values.
Apache Cassandra is a distributed database, meaning that it can run on multiple nodes while acting as a single entity. This makes it resilient and highly scalable. Its scalability, combined with the flexibility afforded by its hybrid key-value/tabular model, allows it to handle many types of big data work very well. Cassandra is also open-source and free, a compelling benefit for organizations seeking to save on licensing fees.
The flexibility of Cassandra makes it suitable for a very large range of use cases. For example, Instagram uses Cassandra to support its content feed, Spotify uses it to store playlist metadata, and Intuit uses it as part of their largest production clusters supporting TurboTax.9,10,11 Common uses of Cassandra include:
- Analysis of customer data for personalization and recommendation, such as in ecommerce environments and content sharing or streaming websites
- Storage and analysis of IoT data, such as data gathered from mobile and wearable devices, environmental sensors, and edge devices
- Fraud detection, especially for financial organizations
- Messaging, such as for organizations’ internal messaging platforms
We chose to test with Cassandra in part because so many organizations rely on it for everyday operations. Approximately 90 percent of Fortune 100 companies use Apache Cassandra in some capacity.12 If your organization uses Cassandra or is considering doing so, to get the most value from it, you will want to ensure that the solution backing your implementation offers high performance. As our testing highlights, the Dell PowerEdge C6620 with Broadcom-based PERC 12 can deliver just that.
Dell PowerEdge servers: A proven history of strong Apache Cassandra performance In this study, we tested the Apache Cassandra performance of a new Dell PowerEdge C6620 server compared to an HPE ProLiant XL170r Gen9 server, but this isn’t the first time we’ve seen strong Cassandra performance on a latest-generation Dell server. In 2019, we tested Apache Cassandra performance on a 14th generation PowerEdge C-series server, the Dell EMC PowerEdge C6420. Pitted against an older modular solution of HPE ProLiant XL170r Gen9 server nodes, the PowerEdge C6420 accomplished double the amount of work in the same amount of rack space.13 Two years prior, in 2017, we assessed a different product line from the 14th generation of PowerEdge servers—the Dell EMC PowerEdge FC640 server—and found that it delivered dramatically more throughput and consistently lower latency than a legacy solution of PowerEdge R710 servers.14 |
Conclusion
The vast amounts of unstructured data that people and organizations generate daily have the potential to bring incredible value to companies that can utilize it quickly and correctly. Buried in the data are insights about consumer preferences, product performance, environmental trends, and more—but to access those insights at the speed of business, you need high-performing NoSQL databases. Aging servers may be holding you back from the full value of your data.
We found that the new Dell PowerEdge C6620 with Broadcom-based PERC 12 RAID controller can speed read-intensive Apache Cassandra database workloads compared to an older server solution. Faster read and update latencies and higher throughput, as we saw the PowerEdge C6620 deliver, can speed the retrieval, processing, and analysis of your unstructured data, enabling you to more effectively extract its value. To more fully utilize your data to inform your everyday business operations, consider the Dell PowerEdge C6620 with Broadcom-based PERC 12 RAID controller.
- “PowerEdge C6620,” accessed June 23, 2023, https://www.delltechnologies.com/asset/en-us/products/servers/ technical-support/poweredge-c6620-spec-sheet.pdf.
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, “Benchmarking Cloud Serving Systems with YCSB,” accessed June 23, 2023, https://courses.cs.duke.edu/fall13/compsci590.4/838-CloudPa- pers/ycsb.pdf.
- “The Ultimate YCSB Benchmark Guide (2021),” accessed June 23, 2023, https://benchant.com/blog/ycsb.
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, “Benchmarking Cloud Serving Systems with YCSB,” accessed June 23, 2023, https://courses.cs.duke.edu/fall13/compsci590.4/838-CloudPa- pers/ycsb.pdf.
- “brianfrankcooper/YCSB,” accessed June 23, 2023, https://github.com/brianfrankcooper/YCSB/blob/master/doc/coreworkloads.html.
- “Dell PowerEdge RAID Controller 12 User’s Guide PERC H965i Adapter, PERC H965i Front, and PERC H965i MX,” accessed June 27, 2023, https://www.dell.com/support/manuals/en-us/perc-h965i-front/perc12/dell-tech- nologies-poweredge-raid-controller-12?guid=guid-5889415d-b297-43a0-9197-113a56c33c79&lang=en-us.
- “SAS4116W 24G SAS Tri-Mode RAID-on-Chip (ROC),” accessed June 27, 2023, https://www.broadcom.com/products/storage/raid-on-chip/sas-4116w.
- “SAS4116W 24G SAS Tri-Mode RAID-on-Chip (ROC).”
- Instagram Engineering, “Open-sourcing a 10x reduction in Apache Cassandra tail latency,” accessed June 27, 2023, https://instagram-engineering.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latencyd- 64f86b43589.
- Kinshuk Mishra and Matt Brown, “Personalization at Spotify using Cassandra,” accessed June 27, 2023, https://engineering.atspotify.com/2015/01/personalization-at-spotify-using-cassandra/.
- Denson Pokta, “Pronto! Intuit Releases First Open Source Cassandra Cluster Manager,” accessed June 27, 2023, https://thenewstack.io/pronto-intuit-releases-first-open-source-cassandra-cluster-manager/.
- Jeff Carpenter, “How the world caught up with Apache Cassandra,” accessed June 27, 2023, https://techcrunch.com/sponsor/datastax/how-the-world-caught-up-with-apache-cassandra/.
- “Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Apache Cassandra database analysis,” accessed June 23, 2023, https://www.principledtechnologies.com/Dell/Power-Edge-C6420-Apache- Cassandra-1019-v2.pdf.
- “Update your private cloud with 14th generation Dell EMC PowerEdge FC640 servers and do more work in less space,” accessed June 23, 2023, https://www.principledtechnologies.com/Dell/PowerEdge_FX2s_FC640_ Apache_Cassandra_1117.pdf.
Related Documents
Apache Cassandra performance advantages of the new Dell PowerEdge C6620 with Dell PERC 12 RAID controller
Thu, 21 Sep 2023 23:14:16 -0000
|Read Time: 0 minutes
The PowerEdge C6620 with PERC 12 delivered lower latency and higher throughput than an HPE ProLiant XL170r Gen9 server with an HPE Smart Array P440ar controller
Overview
Today’s businesses both generate and take in enormous quantities of data as part of their daily operations. Smartphones, computers, servers, and the activities of people online are the sources of some of this data, but more and more of the data also come from a wide variety of other places, such as weather sensors, streaming video cameras, wearable devices, and onboard computers in vehicles, to name just a few examples. One estimate suggests that the number of connected Internet of Things (IoT) devices will reach over 29 billion by 2030.1 With an ever-increasing mountain of data, much of it from non-traditional sources, organizations need a way to extract value from the noise. NoSQL database systems such as Apache Cassandra can help organizations store, process, and analyze this data to glean useful insights. To be most effective, however, the database system should run on a high-performing computing platform that can complete big data workloads quickly and get insights into decision makers’ hands fast. We assessed the ability of two platforms to handle Cassandra workloads. The first was the new Dell™ PowerEdge™ C6620 with Broadcom®-based Dell PowerEdge RAID Controller (PERC) 12, which companies might choose if they’re upgrading to new servers to better handle their big data needs. The second was the older HPE ProLiant XL170r Gen9 server with an HPE Smart Array P440ar controller, which represents a server that organizations might already have in their data centers. The Dell and Broadcom solution provided higher throughput and lower latencies in our testing, meaning that it completed more big data work in the same amount of time as the older HPE solution. With a strong big data solution, businesses can put their data to work and use it to optimize processes, cut costs, improve customer experience, and grow their offerings. This report explores how and why running Apache Cassandra as a big data system on the Dell PowerEdge C6620 server with PERC 12 might be that solution for you.
About the Dell PowerEdge C6620 server Part of the Dell modular infrastructure PowerEdge C-Series, Dell says the PowerEdge C6620 is “designed for compute-intensive workloads” but also “ideal for IOPS-heavy workloads.”2 It features up to two 4th Generation Intel® Xeon® Scalable processors, with up to 56 cores per processor; offers memory speeds of up to 4,800 MT/s; and supports up to 16 NVMe® drives for workload acceleration. Optional liquid cooling is also available.
To learn more about the Dell PowerEdge C6620, visit https://www.dell.com/en-us/shop/ enterprise-products/c6620-two-socket-server-node-intel/spd/poweredge-c6620. |
Assessing Cassandra performance on the Dell PowerEdge C6620 with Broadcom-based PERC 12
Upgrading to new servers is a big decision. You know that newer, more modern technology is likely to offer performance improvements, but exactly what will those benefits look like, and how much more will the new systems be able to handle? Our testing quantifies the performance boost you might see on your Cassandra workloads by moving from HPE ProLiant XL170r Gen9 servers with Smart Array P440ar controllers to new Dell PowerEdge C6620 servers with Broadcom-based PERC 12.
Our configurations
For our test environment, we installed VMware® vSphere® 8 on both servers before configuring a separate infrastructure server with VMware ESXi™ and VMware vCenter®. We used this infrastructure server to manage the servers and to host client VMs that ran our test workload against our databases. The Dell PowerEdge C6620 server with PERC 12 used two Dell U2 Gen4 NVMe® 3.84TB drives. The HPE ProLiant XL170r Gen9 server with HPE Smart Array P440ar controller used six 960GB mixed-use SAS 12Gbps drives. Table 1 highlights more details of our configuration.
On each server, we created a Cassandra gold VM and cloned it five times to create a total of six VMs, which we joined in a cluster configuration. We then used the Yahoo Cloud Serving Benchmark (YCSB) to create a 100GB database across the six VMs to take advantage of the distributed database functionality of Cassandra, ran YCSB workload B for 30 minutes, and recorded the results. In the results we highlight below, we provide two perspectives on the performance of each setup: the total throughput and the average read and write latency. Both results reflect the performance across all six VMs.
Why YCSB?
YCSB is an industry-standard benchmark for NoSQL databases. In 2010, a group from Yahoo! Research created it with “the goal of facilitating performance comparisons of the new generation of cloud data serving systems.”3 It is open source, meaning that anyone can access and modify the source code. In a recent interview, contributors to the YCSB open-source community note that it “is rather largely accepted by users” and “represents a series of scenarios that can be abstracted from the real world.”4 Apache Cassandra was one of the first four databases that the YCSB creators tested with the benchmark in 2010, and YCSB remains a good fit for testing Cassandra performance today.5
YCSB functions by letting users create a database populated with synthetic data on their database system of choice. Users can then run a pre-defined or customized workload against the database to gauge system performance. YCSB offers six core workloads, each of which represents a different type of database work. Our testing used the read-intensive workload B. This workload is 95 percent reads (pulling data from a database) and 5 percent writes (adding to or changing data in a database). YCSB gives one application example as photo tagging, where a user might occasionally add a tag to a photo (write) but will mostly search a library of tagged photos (read).6 A solution that offers higher performance on YCSB workload B is likely to improve performance on other read-intensive workloads, such as data analysis. We chose this workload to focus on reading and analyzing a database.
Upgrade to the Dell PowerEdge C6620 with Broadcom-based PERC 12 for lower latencies and better throughput
In our testing with YCSB, the Dell PowerEdge C6620 with PERC 12 offered better performance on all three metrics we measured: read latency, update (or write) latency, and throughput (measured in operations per second). The improvements were significant, meaning that trading out your HPE ProLiant XL170r Gen9 servers for new PowerEdge C6620 servers could enable your organization to handle substantially more Cassandra work. The first metrics we examined were read latency, which measures the delay between the application requesting a piece of data and the database system delivering it, and update latency, which measures the delay between the application changing or adding a piece of data and the database system completing the action. The Dell PowerEdge C6620 with PERC 12 was much faster on both types of latency, with the largest advantage on update latency. There, it offered 60.2 percent lower—or 1.97 milliseconds less—latency than the HPE ProLiant XL170r Gen9 server with Smart Array P440ar controller. It may seem like a sub-two-millisecond delay is inconsequential; if you were loading a webpage or pulling up a video, you wouldn’t notice a two-millisecond difference. The significance of this advantage, however, is due to the enormous number of operations that the database system must perform before it can deliver usable results. For this testing in YCSB, we set the max execution time variable (or how long the benchmark should run) to 30 minutes. At a rate of 249,210 operations per second (see Figure 3), the Dell PowerEdge C6620 with Broadcom-based PERC 12 executed over 400 million operations during the 30-minute test. So, while a difference of one or two milliseconds might not mean much on a single operation, on 400 million operations, the benefit of the faster solution becomes clear.
About the Dell PERC 12 RAID controllerThe Dell PowerEdge C6620 we tested features the PERC 12, which offers a single front controller with full RAID support for both NVMe and SAS.7 It brings 3,200MHz cache memory speed and a 16-lane host bus type and supports RAID levels 0, 1, 56, 10, 50, and 60.8
The Dell PERC 12 is based on the Broadcom SAS4116W series chip. According to Broadcom, “this eighth- generation SAS RAID-on-Chip (ROC) is based on the industry-leading Fusion-MPT architecture and features Tri-Mode SerDes technology that enables a seamless operation of up to 16-wide direct-connect NVMe, SAS or SATA storage devices from any system design.…The Tri-Mode ROC device with 16-wide PCIe Gen 4.0 lanes provides SAS data transfer rates of 22.5, 12, 6Gb/s per lane and 6Gb/s SATA data transfer rates per lane. The high-port count ROC helps eliminate storage bottlenecks with support of x8, x4, x2, and x1 PCI Express® lanes and complies with the PCIe 4.0 specification, offering up to 6 million IOPS (random reads) and up to 900,000 IOPS in RAID (random writes).”9 To learn more about the Dell PERC 12, visit https://infohub.delltechnologies.com/p/ dell-poweredge-raid-controller-12/. |
With these lower latencies, a solution will be quicker to handle interactions with the Cassandra database, which might include anything from pulling up X-ray images in a hospital to analyzing a large set of data on an ecommerce business’s customer preferences.
The Dell PowerEdge C6620 also offered an enormous advantage on throughput, delivering over twice the operations per second of the HPE ProLiant XL170r Gen9. Given the lower latencies we saw, this is unsurprising—because the PowerEdge C6620 could process operations faster (with lower latency), we would expect it to also be able to handle more operations in a given time. Depending on the read- intensive workloads you’re running, this increase in throughput could translate to quicker load times for your customers or faster data analysis, among other possibilities.
NoSQL databases and Cassandra in today’s business landscape
For this study, we tested with Apache Cassandra, a widely used NoSQL database system. NoSQL, or non-relational, databases are a category of database system that store and query data that do not have a traditional data structure. Traditional SQL databases organize data in a column-row format for finding or creating relationships across the data. To store data in a SQL database, all data in each table must have the same structure and fit a pre-defined schema, with every row in each table including the same columns and formats every time. NoSQL databases, however, can organize data more dynamically. They can deal with data from documents, graphs, key-values, and more. This flexibility lets people use them to analyze documents or data that don’t follow identical structuring formats. For organizations that need to store and analyze unstructured data—which may include data from Internet of Things (IoT) applications, audio, video, text files, social media posts, and more—a NoSQL database is a great option.
There are many types of NoSQL database systems; Apache Cassandra is a type of key-value and wide- column store. These databases have essentially two fields: One is the key, and the other is the value. The value can be any type of data (text, numbers, etc.). Taking our previous example, a key-value database could have some keys that correspond to a date, others that are numbers, and so on. A wide-column database, which Cassandra uses, is a two-dimensional key-value database, where instead of mapping to just one value, the keys can map to several columns of values.
Apache Cassandra is a distributed database, meaning that it can run on multiple nodes while acting as a single entity. This makes it resilient and highly scalable. Its scalability, combined with the flexibility afforded by its hybrid key-value/tabular model, allows it to handle many types of big data work very well. Cassandra is also open-source and free, a compelling benefit for organizations seeking to save on licensing fees.
The flexibility of Cassandra makes it suitable for a very large range of use cases. For example, Instagram uses Cassandra to support its content feed, Spotify uses it to store playlist metadata, and Intuit uses it as part of their largest production clusters supporting TurboTax.10,11,12 Common uses of Cassandra include:
- Analysis of customer data for personalization and recommendation, such as in ecommerce environments and content sharing or streaming websites
- Storage and analysis of IoT data, such as data gathered from mobile and wearable devices, environmental sensors, and edge devices
- Fraud detection, especially for financial organizations
- Messaging, such as for organizations’ internal messaging platforms
We chose to test with Cassandra in part because so many organizations rely on it for everyday operations. Approximately 90 percent of Fortune 100 companies use Apache Cassandra in some capacity.13 If your organization uses Cassandra or is considering doing so, to get the most value from it, you will want to ensure that the solution backing your implementation offers high performance. As our testing highlights, the Dell PowerEdge C6620 with Broadcom-based PERC 12 can deliver just that.
Dell PowerEdge servers: A proven history of strong Apache Cassandra performance In this study, we tested the Apache Cassandra performance of a new Dell PowerEdge C6620 server compared to an HPE ProLiant XL170r Gen9 server, but this isn’t the first time we’ve seen strong Cassandra performance on a latest-generation Dell server. In 2019, we tested Apache Cassandra performance on a 14th generation PowerEdge C-series server, the Dell EMC PowerEdge C6420. Pitted against an older modular solution of HPE ProLiant XL170r Gen9 server nodes, the PowerEdge C6420 accomplished double the amount of work in the same amount of rack space.14 Two years prior, in 2017, we assessed a different product line from the 14th generation of PowerEdge servers—the Dell EMC PowerEdge FC640 server—and found that it delivered dramatically more throughput and consistently lower latency than a legacy solution of PowerEdge R710 servers.15 |
Conclusion
Data proliferation today is rapid, and its growth shows no signs of stopping. For businesses that can take advantage of that data, there is tremendous potential value. One recent McKinsey study notes that “companies that are using data-driven B2B sales-growth engines report above-market growth and EBITDA increases in the range of 15 to 25 percent.”16 With data flooding in so quickly and in so many different forms, however, companies need high-performing big data solutions to have a chance at utilizing that data effectively.
We tested the performance of two platforms with a read-intensive Apache Cassandra database system big- data workload to assess which might be better suited to speedily deliver the insights decision makers need. Compared to an older HPE ProLiant XL170r Gen9 server with an HPE Smart Array P440ar controller, the new Dell PowerEdge C6620 with Broadcom-based PERC 12 RAID controller delivered faster read and update latencies and more than twice the throughput. This improvement in performance can help you glean more value from your unstructured data more quickly. If you’re watching your stores of unstructured data grow but are still leaning on older servers for your critical Cassandra workloads, it may be time for an upgrade.
- Lionel Sujay Vailshery, “Number of Internet of Things (IoT connected devices worldwide from 2019 to 2021, in forecasts from 2022 to 2030,” accessed July 13, 2023, https://www.statista.com/statistics/1183457/iot-connect- ed-devices-worldwide/.
- “PowerEdge C6620,” accessed June 23, 2023, https://www.delltechnologies.com/asset/en-us/products/servers/ technical-support/poweredge-c6620-spec-sheet.pdf.
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, “Benchmarking Cloud Serving Systems with YCSB,” accessed June 23, 2023, https://courses.cs.duke.edu/fall13/compsci590.4/838-CloudPa- pers/ycsb.pdf.
- “The Ultimate YCSB Benchmark Guide (2021),” accessed June 23, 2023, https://benchant.com/blog/ycsb.
- Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears, “Benchmarking Cloud Serving Systems with YCSB,” accessed June 23, 2023, https://courses.cs.duke.edu/fall13/compsci590.4/838-CloudPa- pers/ycsb.pdf.
- “brianfrankcooper/YCSB,” accessed June 23, 2023, https://github.com/brianfrankcooper/YCSB/blob/master/doc/coreworkloads.html.
- “Dell PowerEdge RAID Controller 12 User’s Guide PERC H965i Adapter, PERC H965i Front, and PERC H965i MX,” accessed June 27, 2023, https://www.dell.com/support/manuals/en-us/perc-h965i-front/perc12/dell-tech- nologies-poweredge-raid-controller-12?guid=guid-5889415d-b297-43a0-9197-113a56c33c79&lang=en-us.
- “SAS4116W 24G SAS Tri-Mode RAID-on-Chip (ROC),” accessed June 27, 2023, https://www.broadcom.com/products/storage/raid-on-chip/sas-4116w.
- “SAS4116W 24G SAS Tri-Mode RAID-on-Chip (ROC).”
- Instagram Engineering, “Open-sourcing a 10x reduction in Apache Cassandra tail latency,” accessed June 27, 2023, https://instagram-engineering.com/open-sourcing-a-10x-reduction-in-apache-cassandra-tail-latencyd- 64f86b43589.
- Kinshuk Mishra and Matt Brown, “Personalization at Spotify using Cassandra,” accessed June 27, 2023, https://engineering.atspotify.com/2015/01/personalization-at-spotify-using-cassandra/.
- Denson Pokta, “Pronto! Intuit Releases First Open Source Cassandra Cluster Manager,” accessed June 27, 2023, https://thenewstack.io/pronto-intuit-releases-first-open-source-cassandra-cluster-manager/.
- Jeff Carpenter, “How the world caught up with Apache Cassandra,” accessed June 27, 2023, https://techcrunch.com/sponsor/datastax/how-the-world-caught-up-with-apache-cassandra/.
- “Move your private cloud to Dell EMC PowerEdge C6420 server nodes and boost Apache Cassandra database analysis,” accessed June 23, 2023, https://www.principledtechnologies.com/Dell/Power-Edge-C6420-Apache- Cassandra-1019-v2.pdf.
- “Update your private cloud with 14th generation Dell EMC PowerEdge FC640 servers and do more work in less space,” accessed June 23, 2023, https://www.principledtechnologies.com/Dell/PowerEdge_FX2s_FC640_ Apache_Cassandra_1117.pdf.
- Jochen Böringer, Alexander Dierks, Isabel Huber, and Dennis Spillecke, “Insights to impact: Creating and sustain- ing data-driven commercial growth,” accessed July 13, 2023, https://www.mckinsey.com/capabilities/growthmar- keting-and-sales/our-insights/insights-to-impact-creating-and-sustaining-data-driven-commercial-growth.
BIOS Settings for Optimized Performance on Next-Generation Dell PowerEdge Servers
Thu, 02 Nov 2023 17:45:05 -0000
|Read Time: 0 minutes
Summary
Dell PowerEdge servers provide a wide range of tunable parameters to allow customers to achieve top performance. The information in this paper outlines the tunable parameters available in the latest generation of PowerEdge servers (for example, R660, R760, MX760, and C6620) and provides recommended settings for different workloads.
Figure 1. PowerEdge R660
Figure 2. PowerEdge R760
The following tables provide the BIOS setting recommendations for the latest generation of PowerEdge servers.
Table 1. BIOS setting recommendations—System profile settings
System setup screen | Setting | Default | Recommended setting for performance | Recommended setting for low latency, Stream, and MLC environments | Recommended | |
System profile settings | System Profile | Performance Per Watt [1] | Performance Optimized | First select Performance Optimized and then select Custom [1] | Custom
| |
System profile settings | CPU Power Management | System DBPM | Maximum Performance | Maximum Performance | Maximum Performance | |
System profile settings | Memory Frequency | Maximum Performance | Maximum Performance | Maximum Performance | Maximum Performance | |
System profile settings | Turbo Boost [2] | Enabled | Enabled | Enabled | Enabled | |
System profile settings | C1E | Enabled | Disabled | Disabled | Disabled | |
System profile settings | C States | Enabled | Disabled | Disabled | Autonomous or Disabled [6] | |
System profile settings | Monitor/Mwait | Enabled | Enabled | Disabled [3] | Enabled | |
System profile settings | Memory Patrol Scrub | Standard | Standard [4] | Standard/Disabled [4] | Disabled | |
System profile settings | Memory Refresh Rate | 1x | 1x | 1x | 1x | |
System profile settings | Uncore Frequency | Dynamic | Maximum [5] | Maximum [5] | Dynamic | |
System profile settings | Energy Efficient Policy | Balanced Performance | Performance | Performance | Performance | |
System profile settings | CPU Interconnect Bus Link Power Management | Enabled | Disabled | Disabled | Disabled | |
System profile settings | PCI ASPM L1 Link Power Management | Enabled | Disabled | Disabled | Disabled |
[1] Depends on how system was ordered. Other System Profile defaults are driven by this choice and may be different than the examples listed. Select Performance Profile first, and then select Custom to load optimal profile defaults for further modification
[2] SST Turbo Boost Technology is substantially better than previous generations for latency-sensitive environments, but specific Turbo residency cannot be guaranteed under all workload conditions. Evaluate Turbo Boost Technology in your own environment to choose which setting is most appropriate for your workload, and consider the Dell Controlled Turbo option in parallel.
[3] Monitor/Mwait should only be disabled in parallel with disabling Logical Processor. This will prevent the Linux intel_idle driver from enforcing C-states.
[4] You can test your own environment to determine whether disabling Memory Patrol Scrub is helpful.
[5] Dynamic selection can provide more TDP headroom at the expense of dynamic uncore frequency. Optimal setting is workload dependent.
[6] Autonomous on Air Cooled system or Disabled on Liquid Cooled Systems
Table 2. BIOS setting recommendations—Memory, processor, and iDRAC settings
System setup screen | Setting | Default | Recommended setting for performance | Recommended setting for low latency, Stream, and MLC environments | Recommended |
Memory settings | Memory Operating Mode | Optimizer | Optimizer [1] | Optimizer [1] | Optimizer [1] |
Memory settings | Memory Node Interleave | Disabled | Disabled | Disabled | Disabled |
Memory settings | DIMM Self Healing | Enabled | Disabled | Disabled | Disabled |
Memory settings | ADDDC setting | Disabled [2] | Disabled [2] | Disabled [2] | Disabled [2] |
Memory settings | Memory Training | Fast | Fast | Fast | Fast |
Memory settings | Correctable Error Logging | Enabled | Disabled | Disabled | Disabled |
Processor settings | Logical Processor | Enabled | Disabled [3] | Disabled [3] | Enabled |
Processor settings | Virtualization Technology | Enabled | Disabled | Disabled | Disabled |
Processor settings | CPU Interconnect Speed | Maximum Data Rate | Maximum Data Rate | Maximum Data Rate | Maximum Data Rate |
Processor settings | Adjacent Cache Line Prefetch | Enabled | Enabled | Enabled | Enabled |
Processor settings | Hardware Prefetcher | Enabled | Enabled | Enabled | Enabled |
Processor settings | DCU Streamer Prefetcher | Enabled | Enabled | Disabled | Disabled |
Processor settings | DCU IP Prefetcher | Enabled | Enabled | Enabled | Enabled |
Processor settings | Sub NUMA Cluster | Disabled | SNC 2 | SNC 4 on XCC SNC 2 on MCC | SNC 4 on XCC SNC 2 on MCC |
Processor settings | Dell Controlled Turbo | Disabled | Disabled | Enabled [4] | Disabled |
Processor settings | Dell Controlled Turbo Optimizer mode | Disabled | Enabled [5] | Enabled [5] | Enabled [5] |
Processor settings | XPT Prefetch | Enabled | Disabled | Disabled | Enabled |
Processor settings | UPI Prefetch | Enabled | Disabled | Disabled | Enabled |
Processor settings | LLC Prefetch | Disabled | Enabled | Disabled | Disabled |
Processor settings | DeadLine LLC Alloc | Enabled | Enabled | Enabled | Disabled |
Processor settings | Directory AtoS | Disabled | Disabled | Disabled | Disabled |
Processor settings | Dynamic SST Perf Profile | Disabled | Disabled | Enabled | Disabled |
Processor settings | SST-Perf- profile | Operating Point 1 | Operating Point 1 | Operating Point ? [6] | Operating Point 1 |
iDRAC settings | Thermal Profile | Default | Maximum Performance | Maximum Performance | Maximum Performance |
[1] Use Optimizer Mode when Memory Bandwidth Sensitive, up to 33% BW reduction with Fault Resilient Mode.
[2] Only available when x4 DIMMS installed in the system.
[3] Logical Processor (Hyper Threading) tends to benefit throughput-oriented workloads such as SPEC CPU2017 INT and FP_RATE. Many HPC workloads disable this option. This only benefits SPEC FP_rate if the thread count scales to the total logical processor count.
[4] Dell Controlled Turbo helps to keep core frequency at the maximum all-cores Turbo frequency, which reduces jitter. Disable if Turbo disabled.
[5] Option is available on liquid cooled systems only.
[6] Depends on if your program is affected by Base and Turbo frequency. Will reduce CPU core count and give higher Base and Turbo frequencies.
iDRAC recommendations
- Thermally challenged environments should increase fan speed through iDRAC Thermal section.
- All Power Capping should be removed in performance-sensitive environments.
BIOS settings glossary
- System Profile: (Default=Performance Per Watt)—It can be difficult to set each individual power/performance feature for a specific environment. Because of this, a menu option is provided that can help a customer optimize the system for things such as minimum power usage/acoustic levels, maximum efficiency, Energy Star optimization, or maximum performance.
- Performance Per Watt DAPC (Dell Advanced Power Control)—This mode uses Dell presets to maximize the performance/watt efficiency with a bias towards power savings. It provides the best features for reducing power and increasing performance in applications where maximum bus speeds are not critical. It is expected that this will be the favored mode for SPECpower testing. "Efficiency–Favor Power" mode maintains backwards compatibility with systems that included the preset operating modes before Energy Star for servers was released.
- Performance Per Watt OS—This mode optimizes the performance/watt efficiency with a bias towards performance. It is the favored mode for Energy Star. Note that this mode is slightly different than "Performance Per Watt DAPC" mode. In this mode, no bus speeds are derated as they are in the Performance Per Watt DAPC mode, leaving the operating system in control of those changes.
- Performance—This mode maximizes the absolute performance of the system without regard for power. In this mode, power consumption is not considered. Things like fan speed and heat output of the system, in addition to power consumption, might increase. Efficiency of the system might go down in this mode, but the absolute performance might increase depending on the workload that is running.
- Custom—Custom mode allows the user to individually modify any of the low-level settings that are preset and unchangeable in any of the other four preset modes.
- C-States—C-states reduce CPU idle power. There are three options in this mode:
- Enabled: When “Enabled” is selected, the operating system initiates the C-state transitions. Some operating system software might defeat the ACPI mapping (for example, intel_idle driver).
- Autonomous: When "Autonomous" is selected, HALT and C1 requests get converted to C6 requests in hardware.
- Disable: When "Disable" is selected, only C0 and C1 are used by the operating system. C1 gets enabled automatically when an OS auto-halts.
- C1 Enhanced Mode—Enabling C1E (C1 enhanced) state can save power by halting CPU cores that are idle.
- Turbo Mode—Enabling turbo mode can boost the overall CPU performance when all CPU cores are not being fully utilized. A CPU core can run above its rated frequency for a short period of time when it is in turbo mode.
- Hyper-Threading—Enabling Hyper-Threading lets the operating system address two virtual or logical cores for a physical presented core. Workloads can be shared between virtual or logical cores when possible. The main function of hyper-threading is to increase the number of independent instructions in the pipeline for using the processor resources more efficiently.
- Execute Disable Bit—The execute disable bit allows memory to be marked as executable or non-executable when used with a supporting operating system. This can improve system security by configuring the processor to raise an error to the operating system when code attempts to run in non-executable memory.
- DCA—DCA capable I/O devices such as network controllers can place data directly into the CPU cache, which improves response time.
- Power/Performance Bias—Power/performance bias determines how aggressively the CPU will be power managed and placed into turbo. With "Platform Controlled," the system controls the setting. Selecting "OS Controlled" allows the operating system to control it.
- Per Core P-state—When per-core P-states are enabled, each physical CPU core can operate at separate frequencies. If disabled, all cores in a package will operate at the highest resolved frequency of all active threads.
- CPU Frequency Limits—The maximum turbo frequency can be restricted with turbo limiting to a frequency that is between the maximum turbo frequency and the rated frequency for the CPU installed.
- Energy Efficient Turbo—When energy efficient turbo is enabled, the CPU's optimal turbo frequency will be tuned dynamically based on CPU utilization.
- Uncore Frequency Scaling—When enabled, the CPU uncore will dynamically change speed based on the workload.
- MONITOR/MWAIT—MONITOR/MWAIT instructions are used to engage C-states.
- Sub-NUMA Cluster (SNC)—SNC breaks up the last level cache (LLC) into disjoint clusters based on address range, with each cluster bound to a subset of the memory controllers in the system. SNC improves average latency to the LLC and memory. SNC is a replacement for the cluster on die (COD) feature found in previous processor families. For a multi-socketed system, all SNC clusters are mapped to unique NUMA domains. (See also IMC interleaving.) Values for this BIOS option can be:
- Disabled: The LLC is treated as one cluster when this option is disabled.
- Enabled: Uses LLC capacity more efficiently and reduces latency due to core/IMC proximity. This might provide performance improvement on NUMA-aware operating systems.
- Snoop Preference—Select the appropriate snoop mode based on the workload. There are two snoop modes:
- HS w. Directory + OSB + HitME cache: Best overall for most workloads (default setting)
- Home Snoop: Best for BW sensitive workloads
- XPT Prefetcher—XPT prefetch is a mechanism that enables a read request that is being sent to the last level cache to speculatively issue a copy of that read to the memory controller prefetcher.
- UPI Prefetcher—UPI prefetch is a mechanism to get the memory read started early on DDR bus. The UPI receive path will spawn a memory read to the memory controller prefetcher.
- Patrol Scrub—Patrol scrub is a memory RAS feature that runs a background memory scrub against all DIMMs. This feature can negatively affect performance.
- DCU Streamer Prefetcher—DCU (Level 1 Data Cache) streamer prefetcher is an L1 data cache prefetcher. Lightly threaded applications and some benchmarks can benefit from having the DCU streamer prefetcher enabled. Default setting is Enabled.
- LLC Dead Line Allocation—In some Intel CPU caching schemes, mid-level cache (MLC) evictions are filled into the last level cache (LLC). If a line is evicted from the MLC to the LLC, the core can flag the evicted MLC lines as "dead." This means that the lines are not likely to be read again. This option allows dead lines to be dropped and never fill the LLC if the option is disabled. Values for this BIOS option can be:
- Disabled: Disabling this option can save space in the LLC by never filling MLC dead lines into the LLC.
- Enabled: Opportunistically fill MLC dead lines in LLC, if space is available.
- Adjacent Cache Prefetch—Lightly threaded applications and some benchmarks can benefit from having the adjacent cache line prefetch enabled. Default is Enabled.
- Intel Virtualization Technology—Intel Virtualization Technology allows a platform to run multiple operating systems and applications in independent partitions, so that one computer system can function as multiple virtual systems. Default is Enabled.
- Hardware Prefetcher—Lightly threaded applications and some benchmarks can benefit from having the hardware prefetcher enabled. Default is Enabled.
- Trusted Execution Technology—Enable Intel Trusted Execution Technology (Intel TXT). Default is Disabled.