How PowerFlex Transforms Big Data with VMware Tanzu Greenplum
Wed, 13 Apr 2022 13:16:23 -0000|
Read Time: 0 minutes
Quick! The word has just come down. There is a new initiative that requires a massively parallel processing (MPP) database, and you are in charge of implementing it. What are you going to do? Luckily, you know the answer. You also just discovered that the Dell PowerFlex Solutions team has you covered with a solutions guide for VMware Tanzu Greenplum.
What is in the solutions guide and how will it help with an MPP database? This blog provides the answer. We look at what Greenplum is and how to leverage Dell PowerFlex for both the storage and compute resources in Greenplum.
Infrastructure flexibility: PowerFlex
If you have read my other blogs or are familiar with PowerFlex, you know it has powerful transmorphic properties. For example, PowerFlex nodes sometimes function as both storage and compute, like hyperconverged infrastructure (HCI). At other times, PowerFlex functions as a storage-only (SO) node or a compute-only (CO) node. Even more interesting, these node types can be mixed and matched in the same environment to meet the needs of the organization and the workloads that they run.
This transmorphic property of PowerFlex is helpful in a Greenplum deployment, especially with the configuration described in the solutions guide. Because the deployment is built on open-source PostgreSQL, it is optimized for the needs of an MPP database, like Greenplum. PowerFlex can deliver the compute performance necessary to support massive data IO with its CO nodes. The PowerFlex infrastructure can also support workloads running on CO nodes or nodes that combine compute and storage (hybrid nodes). By leveraging the malleable nature of PowerFlex, no additional silos are needed in the data center, and it may even help remove existing ones.
The architecture used in the solutions guide consists of 12 CO nodes and 10 SO nodes. The CO nodes have VMware ESXi installed on them, with Greenplum instances deployed on top. There are 10 segments and one director deployed for the Greenplum environment. The 12th CO node is used for redundancy.
The storage tier uses the 10 SO nodes to deliver 12 volumes backed by SSDs. This configuration creates a high speed, highly redundant storage system that is needed for Greenplum. Also, two protection domains are used to provide both primary and mirror storage for the Greenplum instances. Greenplum mirrors the volumes between those protection domains, adding an additional level of protection to the environment, as shown in the following figure:
By using this fluid and composable architecture, the components can be scaled independently of one another, allowing for storage to be increased either independently or together with compute. Administrators can use this configuration to optimize usage and deliver appropriate resources as needed without creating silos in the environment.
Testing and validation with Greenplum: we have you covered
The solutions guide not only describes how to build a Greenplum environment, it also addresses testing, which many administrators want to perform before they finish a build. The guide covers performing basic validations with FIO and gpcheckperf. In the simplest terms, these tools ensure that IO, memory, and network performance are acceptable. The FIO tests that were run for the guide showed that the HBA was fully saturated, maximizing both read and write operations. The gpcheckperf testing showed a performance of 14,283.62 MB/sec for write workloads.
Wouldn’t you feel better if a Greenplum environment was tested with a real-world dataset? That is, taking it beyond just the minimum, maximum, and average numbers? The great news is that the architecture was tested that way! Our Dell Digital team has developed an internal test suite running static benchmarked data. This test suite is used at Dell Technologies across new Greenplum environments as the gold standard for new deployments.
In this test design, all the datasets and queries are static. This scenario allows for a consistent measurement of the environment from one run to the next. It also provides a baseline of an environment that can be used over time to see how its performance has changed -- for example, if the environment sped up or slowed down following a software update.
Massive performance with real data
So how did the architecture fare? It did very well! When 182 parallel complex queries were run simultaneously to stress the system, it took just under 12 minutes for the test to run. In that time, the environment had a read bandwidth of 40 GB/s and a write bandwidth of 10 GB/s. These results are using actual production-based queries from the Dell Digital team workload. These results are close to saturating the network bandwidth for the environment, which indicates that there are no storage bottlenecks.
The design covered in this solution guide goes beyond simply verifying that the environment can handle the workload; it also shows how the configuration can maintain performance during ongoing operations.
Maintaining performance with snapshots
One of the key areas that we tested was the impact of snapshots on performance. Snapshots are a frequent operation in data centers and are used to create test copies of data as well as a source for backups. For this reason, consider the impact of snapshots on MPP databases when looking at an environment, not just how fast the database performs when it is first deployed.
In our testing, we used the native snapshot capabilities of PowerFlex to measure the impact that snapshots have on performance. Using PowerFlex snapshots provides significant flexibility in data protection and cloning operations that are commonly performed in data centers.
We found that when the first storage-consistent snapshot of the database volumes was taken, the test took 45 seconds longer to complete than initial tests. This result was because it was the first snapshot of the volumes. Follow-on snapshots during testing resulted in minimal impact to the environment. This minimal impact is significant for MPP databases in which performance is important. (Of course, performance can vary with each deployment.)
We hope that these findings help administrators who are building a Greenplum environment feel more at ease. You not only have a solution guide to refer to as you architect the environment, you can be confident that it was built on best-in-class infrastructure and validated using common testing tools and real-world queries.
The bottom line
Now that you know the assignment is coming to build an MPP database using VMware Tanzu Greenplum -- are you up to the challenge?
If you are, be sure to read the solution guide. If you need additional guidance on building your Greenplum environment on PowerFlex, be sure to reach out to your Dell representative.
Related Blog Posts
PowerFlex and Amazon: Destination EKS Anywhere
Thu, 13 Jan 2022 23:01:50 -0000|
Read Time: 0 minutes
Welcome to your destination. Today Dell Technologies is pleased to share that Amazon Elastic Kubernetes Service (Amazon EKS) Anywhere has been validated on Dell PowerFlex software-defined infrastructure. Amazon EKS Anywhere is a new deployment option for Amazon EKS that enables customers to easily create and operate Kubernetes clusters on-premises while allowing for easy connectivity and portability to Amazon AWS environments. PowerFlex helps customers deliver a flexible deployment solution that scales as needs change with smooth, painless node-by-node expandability, inclusive of compute and storage, in a unified fabric architecture.
Dell Technologies collaborates with a broad ecosystem of public cloud providers to help our customers support multi-cloud environments that help place the right data and applications where it makes the most sense for them. Deploying Amazon EKS Anywhere on Dell Technologies infrastructure streamlines application development and delivery by allowing organizations to easily create and manage on premises Kubernetes clusters.
Across nearly all industries, IT organizations are moving to a more developer-oriented model that requires automated processes, rapid resource delivery, and reliable infrastructure. To drive operational simplicity through Kubernetes orchestration, Amazon EKS Anywhere helps customers automate cluster management, reduce support costs, and eliminate the redundant effort of using multiple open source or 3rd party tools to manage Kubernetes clusters. The combination of automated Kubernetes cluster management with intelligent, automated infrastructure quickly brings organizations to the next stop in their IT Journey, allowing them to provide infrastructure as code and empower their DevOps teams to be the innovation engine for their businesses.
Let us explore Amazon EKS Anywhere on PowerFlex and how it helps you move towards a more developer-oriented model. First, let’s look at the requirements for Amazon EKS Anywhere.
To deploy Amazon EKS Anywhere we will need a PowerFlex environment running VMware vSphere 7.0 or higher. Specifically, our validation used vSphere 7.0.2. We will also need to ensure we have sufficient capacity to deploy 8 to 10 Amazon EKS VMs. Additionally, we will need a network in the vSphere workload cluster with a DHCP service. This network is what the workload VMs will connect to. There are also a few Internet locations that the Amazon EKS administrative VM will need to reach, so that the manifests, OVAs, and Amazon EKS distro can be downloaded. Initial deployments can start with as few as four PowerFlex nodes and grow to meet the expansion needs of storage, compute, or both for scalability of over 1,000 nodes.
The logical view of the Amazon EKS Anywhere environment on PowerFlex is illustrated below.
There are two types of templates used for the workloads: a Bottlerocket template and an Ubuntu image. The Bottlerocket template is a customized image from Amazon that is specific to Amazon EKS Anywhere. The Ubuntu template was used for our validation.
Note: Bottlerocket is a Linux-based open-source operating system that is purpose-built by Amazon. It focuses on security and maintainability, and provides a reliable, consistent, and safe platform for container-based workloads. Amazon EKS managed node groups with Bottlerocket support enable you to leverage the simplicity of managed node provisioning and lifecycle management features, while using the latest best practices for running containers in production. You can run your Kubernetes workloads on Bottlerocket nodes and benefit from enhanced security, higher cluster utilization, and less operational overhead. https://aws.amazon.com/blogs/containers/amazon-eks-adds-native-support-for-bottlerocket-in-managed-node-groups/
After the Amazon EKS admin VM is deployed, a command is issued on the Amazon EKS admin VM. This deploys the workload clusters and creates associated CRD instances on the workload cluster. This illustrates the ease of container deployment with Amazon EKS Anywhere. A single instance was prepped, then with some built-in scripting and commands, the system can direct the complex deployment. This greatly simplifies the process when compared to a traditional Kubernetes deployment.
At this point, the deployment can be tested. Amazon provides a test workload that can be used to validate the environment. You can find the details on testing on the Amazon EKS Anywhere documentation site.
The design that was validated was more versatile than a typical Amazon EKS Anywhere deployment. Instead of using the standard VMware CNS-CSI storage provider, this PowerFlex validation uses the Dell PowerFlex CSI plugin. This makes it possible to take direct advantage of PowerFlex’s storage capabilities. With the CSI plugin, it is possible to extend volumes through Amazon EKS, as well as snapshot and restore volumes.
This allows IT departments to move toward developer-oriented processes. Developers can work with storage natively. There are no additional tools to learn and no need to perform operations outside the development environment. This can be a time savings benefit to developer-oriented IT departments.
Beyond storage control in Amazon EKS Anywhere, the results of these operations can be viewed in the PowerFlex management interface. This provides an end-to-end view of the environment and allows traditional IT administrators to use familiar tools to manage and monitor their environment. This makes it easy for the entire IT organization’s journey to move towards a more developer centric environment.
By leveraging Amazon EKS Anywhere on PowerFlex, organizations get on-premises Kubernetes operational tooling that’s consistent with Amazon EKS. Organizations are able to leverage the Amazon EKS console to view all of their Kubernetes clusters (including Amazon EKS Anywhere clusters) running anywhere, through the Amazon EKS Connector. This brings together both the data center and cloud, simplifying the management of both.
In this journey, we have seen that Amazon EKS Anywhere has been validated on Dell PowerFlex, shown how they work together, and enable expanded storage capabilities for developers inside of Amazon EKS Anywhere. It also allows you to use familiar tools in managing the environment. To find out more about Amazon EKS anywhere on PowerFlex, talk with your Dell representative.
Author: Tony Foster, Sr. Technical Marketing Engineer
Dell EMC PowerFlex and VMware Cloud Foundation for High Performance Applications
Thu, 25 Jun 2020 13:10:33 -0000|
Read Time: 0 minutes
The world in 2020 has shown all industries that innovation is necessary to thrive in all conditions. VMware Cloud Foundation (VCF) hybrid cloud platform was crafted by innovators who realize the biggest asset our customers have is their information technology and the data that runs the business. The VCF offering takes the complexity out of operationalizing infrastructure to enable greater elasticity, growth, and simplification through improved automation. VCF enables options available using on-premises and multi-cloud deployments to address ever changing enterprise needs.
VMware included design factors that anticipated customers’ use of varying storage options in the flexibility of implementing VCF. VMware vSAN is the standard for VCF hyperconverged infrastructure (HCI) deployments and is directly integrated into vSphere and VCF. For those circumstances where workloads or customer resource usage require alternative storage methods, VMware built flexibility into the VCF storage offering. Just as we see a wide variety in desktop computing devices, one size doesn't fit all applies to the enterprise storage products as well. Dell Technologies’ PowerFlex (formerly VxFlex) provides a software-defined mechanism to add a combination of compute and storage with scale out flexibility. As customers look to software-defined operational constructs for agility, PowerFlex provides an adjustable means to add the right balance of storage resources while enabling non-disruptive additions without painful migrations as demands increase.
Joining the Dell Technologies Cloud family as a validated design, Dell EMC PowerFlex helps customers simplify their path to hybrid cloud by combining the power of Dell EMC infrastructure with VMware Cloud Foundation software as supplemental storage. As a high-performance, scale out, software-defined block storage product, PowerFlex provides a combination of storage and compute in a unified fabric that's well equipped to service particularly challenging workloads. The scalability of compute and/or storage in a modular architecture provides an asymmetrical (2-layer) option to add capacity to either compute or storage independently. PowerFlex makes it possible to transform from a traditional three-tier architecture to a modern data center without any trade-offs between performance, resilience or future expansion.
PowerFlex significantly reduces operational and infrastructure complexity, empowering organizations to move faster by delivering flexibility, elasticity, and simplicity with predictable performance and resiliency at scale for deployments. PowerFlex Manager is a key element of our engineered systems providing a full lifecycle administration experience for PowerFlex from day 0 through expansions and upgrades which is independent, but complementary to the full stack life cycle management available through VCF via SDDC Manager. A cornerstone value proposition of VCF is administering the lifecycle management of OS upgrades, vSphere updates, vRealize monitoring, automation and NSX administration. PowerFlex manager works in parallel with VCF to deliver a comprehensive lifecycle experience for the physical ingredients and for the PowerFlex software-define storage layer. PowerFlex also offers a vRealize Operations plug-in for a unified monitoring capability from VMware vRealize Suite which is included in most VCF editions. From a storage management perspective, PowerFlex utilizes a management system that complements VCF and VMware vSphere by working within the appropriate vCenter management constructs. PowerFlex Manager provides the administration of PowerFlex storage functions, while VCF and vCenter manages the allocation of LUNs to provisioned VMFS file systems to provide data stores for the provisioned workloads.
PowerFlex systems enables customers to scale from a small environment to enterprise scale with over a thousand nodes. In addition, it provides enterprise grade data protection, multi-tenant capabilities, and add-on enterprise features such as QoS, thin provisioning, compression and snapshots. PowerFlex systems deliver the performance and time-to-value required to meet the demands of the modern enterprise data center.
Does Supplemental Storage Mean Slow or Light Workload Use Cases?
PowerFlex provides a Dell Technologies validated design as a supplemental storage platform for VCF, unlocking the value of PowerFlex to be realized by customers within the VCF environment. By providing sub-millisecond latency, high IOPS and high throughput with linearity as nodes join the fabric, the result is a very predictable scaling profile that accelerates the VCF vision within the datacenter.
PowerFlex, as a part of VCF, can help solve for even the most demanding of applications. Using the supplemental capabilities to service workloads with the highest of efficiency provides a best of class performance experience. Some illustrative examples of demanding application workloads validated with PowerFlex, independent of VCF, include the following:
SAP HANA certified for PowerFlex integrated rack in both 4-socket and 2-socket offerings (certification details). Highly efficient in hosting up to six production HANA instances per 4-socket server. Our capabilities outperform external competitors by hosting 2x the capacity. The Configuration and Deployment Best Practices for SAP HANA white paper provides details. While this white paper illustrates a single layer architecture, even better performance characteristics are achievable using the VCF aligned 2-layer architectural implementation of PowerFlex.
Oracle RAC & Microsoft SQL
Flexibility to run compute and storage on separate hardware results in significant reduction of database licensing cost.
- Oracle RAC Solution (white paper) – Get over 1 Million IOPs with less than 1ms latency with Oracle 12c RAC database transactions in just six nodes delivering 33GB/sec throughput (5.6GB/sec per node).
- Oracle 19c RAC TPC-C achieving more than 10 Million TPMs in eight nodes (white paper).
- MS SQL 2019 Solution (white paper) or MS SQL 2019 Big Data Cluster with Kubernetes (white paper) delivering approximately 9 Million SQL Server transactions (TPMs) with less than 1ms latency using just five storage nodes.
Validated/certified by SAS for running SAS mixed analytics workloads (white paper) providing an average throughput of 210 MBs per core (40% greater than their recommended 150 MB/sec needed for certification).
The validated solution (white paper) with Elastic provides customers with the required high-performance, scalable, block-based IO with flexible deployment options in multiple operating environments (Windows, Linux, Virtualized/Bare Metal). Elastic validated the efficiency of PowerFlex using only three compute and 4 storage nodes to deliver ~1 billion indexing events measured by Elastic’s Rally benchmarking tool.
The validated PowerFlex solution for Epic delivers 6x9’s availability and high performance for critical the EPIC hyperspace workloads while simultaneously enabling hosting the VDI with the operational and analytical databases for a completely integrated infrastructure option.
For customers deploying Kubernetes container-based database deployments like Cassandra, PowerFlex provides 300,000 operations/second for 10 million operations (Read intensive operations) with avg read latency of 1ms on just eight nodes.
PowerFlex gives Dell Technologies the ability to help customers address diverse infrastructure needs. For more information on all of the Dell Technologies storage options with Cloud Validated Designs for VMware Cloud Foundation, please view our white paper. The implementation guide for using PowerFlex for supplemental storage provides the simple steps to provide complementary storage options for VCF deployments. For more information on the PowerFlex product family and workload solutions, please see the product page here. The PowerFlex White Paper - Technical Overview also provides a comprehensive perspective how organizations can begin changing the way they think about a modern data center architecture. Please contact your local Dell sales representative for more information.
Other pre-tested Dell Technologies Storage products validated for VMware Cloud Foundation that provide the capabilities to independently scale storage and compute include the offerings below. You can find more details in the Dell Technologies Cloud Validated Designs document.