The future of Cloud-Native infrastructure is Resilient and Flexible
Mon, 13 Dec 2021 18:40:31 -0000|
Read Time: 0 minutes
Next generation infrastructures to support Cloud-Native workloads must be resilient and flexible to satisfy workload requirements while also reducing the management burden on IT staffers.
While much of the emphasis on the benefits of Cloud-Native infrastructure are focused on speed and agility from development to deployment, the rise of stateful containerized applications will force organizations to take resiliency, storage performance and data services more seriously. In the Voice of the Enterprise: DevOps, Workloads & Projects 2020 study, 56% of organizations have more than 50% applications that are stateful and this trend will rise as more production workloads run on containers.
The need for persistent storage also raises the stakes for data protection capabilities such as snapshots, replication, backup and disaster recovery. Even when it comes to non-mission critical and non-business critical workloads such as test/dev, organizations have minimal tolerance for downtime or data loss. The rising customer expectations for resiliency will only increase pressure on organizations to implement storage systems with rich data protection capabilities and the ability to automate the deployment of these features based on the importance of a particular workload.
Data placement and optimization continue to be key concerns in large scale environments, and it is important for next generation systems to provide intelligent load balancing to position data across nodes in a manner that makes optimal use of resources. These data placement capabilities need to be automated, since many of these operations will occur in the background when workloads are not as active.
Though it is tempting to go with a clean sheet approach when designing next generation infrastructures for emerging Cloud-Native workloads, workloads that are branded as “legacy” do not disappear, even if they are not top of mind in planning discussions. In interactions with organizations building out Cloud-Native infrastructures, it is far more common for them to be running their containerized workloads on top of or inside of VMs today, as opposed to building a new silo of infrastructure for Cloud-Native.
Just as VMs have not completely displaced workloads running on non-virtualized physical systems, we are still a long way from seeing all of the applications currently running in VMs shifting over completely to containers. Infrastructures which have the flexibility to provide compute and storage resources for physical, virtualized, and containerized workloads simultaneously will be necessary for many years.
For more information, please read the 451 Research Special Report:
Author: Henry Baltazar
Copyright © 2021 S&P Global Market Intelligence.
The content of this artifact is for educational purposes only. 451 Research, S&P Global Market Intelligence does not endorse any companies, technologies, products, services, or solutions.
Related Blog Posts
Deploying Microsoft SQL Server Big Data Clusters on Kubernetes platform using PowerFlex
Wed, 15 Dec 2021 12:20:15 -0000|
Read Time: 0 minutes
Microsoft SQL Server 2019 introduced a groundbreaking data platform with SQL Server 2019 Big Data Clusters (BDC). Microsoft SQL Server Big Data Clusters are designed to solve the big data challenge faced by most organizations today. You can use SQL Server BDC to organize and analyze large volumes of data, you can also combine high value relational data with big data. This blog post describes the deployment of SQL Server BDC on a Kubernetes platform using Dell EMC PowerFlex software-defined storage.
Dell EMC PowerFlex (previously VxFlex OS) is the software foundation of PowerFlex software-defined storage. It is a unified compute storage and networking solution delivering scale-out block storage service that is designed to deliver flexibility, elasticity, and simplicity with predictable high performance and resiliency at scale.
The PowerFlex platform is available in multiple consumption options to help customers meet their project and data center requirements. PowerFlex appliance and PowerFlex rack provide customers comprehensive IT Operations Management (ITOM) and life cycle management (LCM) of the entire infrastructure stack in addition to sophisticated high-performance, scalable, resilient storage services. PowerFlex appliance and PowerFlex rack are the preferred and proactively marketed consumption options. PowerFlex is also available on VxFlex Ready Nodes for those customers who are interested in software-defined compliant hardware without the ITOM and LCM capabilities.
PowerFlex software-define storage with unified compute and networking offers flexibility of deployment architecture to help best meet the specific deployment and architectural requirements. PowerFlex can be deployed in a two-layer for asymmetrical scaling of compute and storage for “right-sizing capacities, single-layer (HCI), or in mixed architecture.
Microsoft SQL Server Big Data Clusters Overview
Microsoft SQL Server Big Data Clusters are designed to address big data challenges in a unique way, BDC solves many traditional challenges through building big-data and data-lake environments. You can query external data sources, store big data in HDFS managed by SQL Server, or query data from multiple external data sources using the cluster.
SQL Server Big Data Clusters is an additional feature of Microsoft SQL Server 2019. You can query external data sources, store big data in HDFS managed by SQL Server, or query data from multiple external data sources using the cluster.
For more information, see the Microsoft page SQL Server Big Data Clusters partners.
You can use SQL Server Big Data Clusters to deploy scalable clusters of SQL Server and Apache SparkTM and Hadoop Distributed File System (HDFS), as containers running on Kubernetes.
For an overview of Microsoft SQL Server 2019 Big Data Clusters, see Microsoft’s Introducing SQL Server Big Data Clusters and on GitHub, see Workshop: SQL Server Big Data Clusters - Architecture.
Deploying Kubernetes Platform on PowerFlex
For this test, PowerFlex 3.6.0 is built in a two-layer configuration with six Compute Only (CO) nodes and eight Storage Only (SO) nodes. We used PowerFlex Manager to automatically provision the PowerFlex cluster with CO nodes on VMware vSphere 7.0 U2, and SO nodes with Red Hat Enterprise Linux 8.2.
The following figure shows the logical architecture of SQL Server BDC on Kubernetes platform with PowerFlex.
Figure 1: Logical architecture of SQL BDC on PowerFlex
From the storage perspective, we created a single protection domain from eight PowerFlex nodes for SQL BDC. Then we created a single storage pool using all the SSDs installed in each node that is a member of the protection domain.
After we deployed the PowerFlex cluster, we created eleven virtual machines on the six identical CO nodes with Ubuntu 20.04 on them, as shown in the following table.
Table 1: Virtual machines for CO nodes
2 x Intel Gold 6242 R, 20 cores
2 x Intel Gold 6242R, 20 cores
2 x Intel Gold 6242R, 20 cores
2 x Intel Gold 6242R, 20 cores
2 x Intel Gold 6242R, 20 cores
2 x Intel Gold 6242R, 20 cores
We manually installed the SDC component of PowerFlex on the worker nodes of Kubernetes. We then configured a Kubernetes cluster (v 1.20) on the virtual machines with three master nodes and eight worker nodes:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8m1 Ready control-plane,master 10d v1.20.10
k8m2 Ready control-plane,master 10d v1.20.10
k8m3 Ready control-plane,master 10d v1.20.10
k8w1 Ready <none> 10d v1.20.10
k8w2 Ready <none> 10d v1.20.10
k8w3 Ready <none> 10d v1.20.10
k8w4 Ready <none> 10d v1.20.10
k8w5 Ready <none> 10d v1.20.10
k8w6 Ready <none> 10d v1.20.10
Dell EMC storage solutions provide CSI plugins that allow customers to deliver persistent storage for container-based applications at scale. The combination of the Kubernetes orchestration system and the Dell EMC PowerFlex CSI plugin enables easy provisioning of containers and persistent storage.
In the solution, after we installed the Kubernetes cluster, CSI 2.0 was provisioned to enable persistent volumes for SQL BDC workload.
For more information about PowerFlex CSI supported features see Dell CSI Driver Documentation.
For more information about PowerFlex CSI installation using Helm charts, see PowerFlex CSI Documentation.
Deploying Microsoft SQL Server BDC on Kubernetes Platform
When the Kubernetes cluster with CSI is ready, Azure data CLI is installed on the client machine. To create base configuration files for deployment, see deploying Big Data Clusters on Kubernetes . For this solution, we used kubeadm-dev-test as the source for the configuration template.
Initially, using kubectl, each node is labelled to ensure that the pods start on the correct node:
$ kubectl label node k8w1 mssql-cluster=bdc mssql-resource=bdc-master --overwrite=true
$ kubectl label node k8w2 mssql-cluster=bdc mssql-resource=bdc-compute-pool --overwrite=true
$ kubectl label node k8w3 mssql-cluster=bdc mssql-resource=bdc-compute-pool --overwrite=true
$ kubectl label node k8w4 mssql-cluster=bdc mssql-resource=bdc-compute-pool --overwrite=true
$ kubectl label node k8w5 mssql-cluster=bdc mssql-resource=bdc-compute-pool --overwrite=true
$ kubectl label node k8w6 mssql-cluster=bdc mssql-resource=bdc-compute-pool --overwrite=true
To accelerate the deployment of BDC, we recommend that you use an offline installation method from a local private registry. While this means some extra work in creating and configuring a registry, it eliminates the network load of every BDC host pulling container images from the Microsoft repository. Instead, they are pulled once. On the host that acts as a private registry, install Docker and enable the Docker repository.
The BDC configuration is modified from the default settings to use cluster resources and address the workload requirements. For complete instructions about modifying these settings, see Customize deployments section in the Microsoft BDC website. To scale out the BDC resource pools, the number of replicas are adjusted to use the resources of the cluster.
The values shown in the following table are adjusted in the bdc.json file.
Table 2: Cluster resources
Apache Knox Gateway
Spark service resource configuration
Keeps track of nodes within the cluster
The configuration values for running Spark and Apache Hadoop YARN are also adjusted to the compute resources available per node. In this configuration, sizing is based on 768 GB of RAM and 72 virtual CPU cores available per PowerFlex CO node. Most of this configuration is estimated and adjusted based on the workload. In this scenario, we assumed that the worker nodes were dedicated to running Spark workloads. If the worker nodes are performing other operations or other workloads, you may need to adjust these values. You can also override Spark values as job parameters.
For further guidance about configuration settings for Apache Spark and Apache Hadoop in Big Data Clusters, see Configure Apache Spark & Apache Hadoop in the SQL Server BDC documentation section.
The following table highlights the spark settings that are used on the SQL Server BDC cluster.
Table 3: Spark settings
The SQL Server BDC 2019 CU12 release notes state that Kubernetes API 1.20 is supported. Therefore, for this test, the image that was deployed on the SQL master pod was 2019-CU12-ubuntu-16.04. A storage of 20 TB was provisioned for SQL master pod, with 10 TB as log space:
Because the test involved running the TPC-DS workload, we provisioned a total of 60 TB of space for five storage pods:
Validating SQL Server BDC on PowerFlex
To validate the configuration of the Big Data Cluster that is running on PowerFlex and to test its scalability, we ran the TPC-DS workload on the cluster using the Databricks® TPC-DS Spark SQL kit. The toolkit allows you to submit an entire TPC-DS workload as a Spark job that generates the test dataset and runs a series of analytics queries across it. Because this workload runs entirely inside the storage pool of the SQL Server Big Data Cluster, the environment was scaled to run the recommended maximum of five storage pods.
We assigned one storage pod to each worker node in the Kubernetes environment as shown in the following figure.
Figure 2: Pod placement across worker nodes
In this solution, Spark SQL TPC-DS workload is adopted to simulate a database environment that models several applicable aspects of a decision support system, including queries and data maintenance. Characterized by high CPU and I/O load, a decision support workload places a load on the SQL Server BDC cluster configuration to extract maximum operational efficiencies in areas of CPU, memory, and I/O utilization. The standard result is measured by the query response time and the query throughput.
A Spark JAR file is uploaded into a specified directory in HDFS, for example, /tpcds. The spark-submit is done by CURL, which uses the Livy server that is part of Microsoft SQL Server Big Data Cluster.
Using the Databricks TPC-DS Spark SQL kit, the workload is run as Spark jobs for the 1 TB, 5 TB, 10 TB, and 30 TB workloads. For each workload, only the size of the dataset is changed.
The parameters used for each job are specified in the following table.
Table 4: Job parameters
We set the TPC-DS dataset with the different scale factors in the CURL command. The data was populated directly into the HDFS storage pool of the SQL Server Big Data Cluster.
The following figure shows the time that is consumed for data generation of different scale factor settings. The data generation time also includes the post data analysis process that calculates the table statistics.
Figure 3: TPC-DS data generation
After the load we ran the TPC-DS workload to validate the Spark SQL performance and scalability with 99 predefined user queries. The queries are characterized with different user patterns.
The following figure shows the performance and scalability test results. The results demonstrate that running Microsoft SQL Server Big Data Cluster on PowerFlex has linear scalability for different datasets. This shows the ability of PowerFlex to provide a consistent and predictable performance for different types of Spark SQL workloads.
Figure 4: TPC-DS test results
A Grafana dashboard instance that is captured during the 30 TB run of TPC-DS test is shown in the following figure. The figure shows that the read bandwidth of 15 GB/s is achieved during the tests.
Figure 5: Grafana dashboard
In this minimal lab hardware, there were no storage bottlenecks for the TPC-DS data load and query execution. The CPU on the worker nodes reached close to 90 percent indicating that more powerful nodes could enhance the performance.
Running SQL Server Big Data Clusters on PowerFlex is a straightforward way to get started with modernized big data workloads running on Kubernetes. This solution allows you to run modern containerized workloads using the existing IT infrastructure and processes. Big Data Clusters allows Big Data scientists to innovate and build with the agility of Kubernetes, while IT administrators manage the secure workloads in their familiar vSphere environment.
In this solution, Microsoft SQL Server Big Data Clusters are deployed on PowerFlex which provides the simplified operation of servicing cloud native workloads and can scale without compromise. IT administrators can implement policies for namespaces and manage access and quota allocation for application focused management. Application-focused management helps you build a developer-ready infrastructure with enterprise-grade Kubernetes, which provides advanced governance, reliability, and security.
Microsoft SQL Server Big Data Clusters are also used with Spark SQL TPC-DS workloads with the optimized parameters. The test results show that Microsoft SQL Server Big Data Clusters deployed in a PowerFlex environment can provide a strong analytics platform for Big Data solutions in addition to data warehousing type operations.
If you want to discover more, contact your Dell representative.
SAP and Dell Transforming Data at the Edge, Cloud, and Core
Wed, 01 Dec 2021 10:02:49 -0000|
Read Time: 0 minutes
Organizations understand that to thrive in the digital economy, they must derive better insights from the data they own—and do this in a more automated, reliable way. The faster this process, the faster the time to insights.
SAP, supported by Dell Technologies and Red Hat, organized the SAP Business Technology Platform (BTP) Packathon, engaging the APJ partner community to design and develop industry solutions of high business value by leveraging BTP services that were hosted between the SAP cloud and the Dell Technologies Sydney Customer Solution Center (CSC).
SAP data intelligence and SAP HANA infrastructure were hosted from the Dell Technologies Sydney CSC, running on the Red Hat operating system and Red Hat OpenShift Cluster 4.6. These systems were integrated into SAP Data Warehouse Cloud, SAP Analytics Cloud, S/4 HANA systems, and partner environments (hosting external systems and data sources).
The following image shows a high-level overview of the BTP Packathon architecture:
This is a testimony to the role Dell plays as customer infrastructure strategy evolves: at the edge, in the cloud, or at the core.
Enterprises want to become smarter when it comes to their operations, their cloud strategy, and their approach to extracting value from an ever‑increasing volume of data.
For businesses running SAP, the key lies in adopting SAP HANA and S/4HANA applications with a cloud‑smart strategy. SAP S/4HANA and SAP Intelligent Technologies address this integrative approach by bringing transactional data together with Big Data and the Internet of Things (IoT).
That means fueling business processes with insights from data across all information sources: at the edge; in core SAP environments; from IOT, data lakes and Hadoop repositories; in the cloud.
When exploring the edge, customers will look to solve one of these objectives:
- Minimize latency and maximize availability for distributed operations
- Build an agile cloud-native application development platform
- Minimize costs of edge data management, transport, and associated network bandwidth
- Run artificial intelligence applications closer to machines or devices
Deploying artificial intelligence (AI) at the edge opens a whole new world of possibilities. Edge deployments with AI can deliver real-time, actionable insights at the point of decision while incurring lower latency and costs than by transferring data back and forth between the data center and the cloud. Limited staffing and harsh environments can make it difficult to turn the vision of the intelligent edge into a reality, however. This is where the Dell Technologies and SAP end-to-end approach comes in. To help empower you at the edge, Dell Technologies is taking a three-pronged approach:
- Designing and launching specific edge solutions that target current edge implementations and use cases
- Optimizing our portfolio for the edge, building a foundation by delivering necessary capabilities across our portfolio
- Integrating our technologies into the Dell Technologies Edge Platform, creating edge-specific technology that is purpose-built for the platform
In a recent joint podcast, Dell Technologies and SAP discussed some of the key components of the intelligent enterprise: cloud, edge, data analytics, and automation. See Dell and SAP: Powering the Intelligent Enterprise.
Cloud—Complementing RISE with SAP
Dell Technologies is now at the very center of SAP’s transformation strategy to RISE and S/4HANA. Our customers can consume SAP S/4HANA “as a device” with all the benefits of the Dell Technologies APEX model. RISE with SAP powered by APEX simplifies the path to the SAP Intelligent Enterprise with an on-premises cloud experience in either the customer’s own data center or a co-location environment.
A turnkey cloud subscription offering with Dell Technologies APEX, which is available through SAP, reduces the risk of implementation and outages and frees up resources so that customers can focus on the business outcomes that SAP S/4HANA provides.
The subscription helps customers leverage cloud economies and capabilities while keeping their SAP software landscape and data securely in their own data center or co-location—for data sovereignty concerns, for latency or application entanglement reasons, or because they simply lack access to hyperscalers.
APEX offers the ease and scale of cloud delivered as-a-service with simplicity, agility, and control for our customers. Create your own on-demand environment with infrastructure and services you customize to order. Deploy a pay-per-use consumption model or an enterprise-scale managed utility. See Introducing RISE with SAP S/4HANA Cloud, powered by APEX - Dell Technologies.
Core—Choosing the right platform
According to Gartner, by 2022 80 percent of SAP HANA deployments will continue to be on-premises. IDC and Gartner survey results and deployments show a preference for an on-premises deployment and hybrid model for SAP applications running on the SAP HANA database and edge-based applications. Organizations prefer a cloud-like experience, where they have features such as simplicity, flexibility, quick turnaround for faster innovation, and pay-for-use at their fingertips.
One of the most critical decisions a customer makes when planning an SAP S/4HANA migration is selecting the type of deployment option which best suits their business needs: on-premises; private cloud; public cloud; or hybrid cloud. As they evaluate the options, the total cost of ownership (TCO) is top of mind.
Based on business requirements and global surveys, top priorities for our customers are migrating some of their SAP landscapes to cloud while keeping their crown jewels on-premises. Also, some customers that are repatriating from public cloud to on-premises have primary reasons such as data residency and regulatory requirements, cyber security, overhead costs, and so on.
The following document presents the result of a three-year TCO comparison of an SAP HANA database and S/4HANA application server landscape that is deployed on-premises, running on Dell EMC VxRail hyperconverged infrastructure, against one that is deployed on an Amazon Web Services (AWS) cloud computing platform: The Total Cost of Ownership of On-Premises SAP on Dell EMC VxRail HCI Versus Amazon Web Services Public Cloud | Dell Technologies Info Hub
The SAP, Dell, and Red Hat BTP platform and services are available for customer and partner proof of concepts, demos, events, and industry-aligned solution developments. Contact your SAP, Dell, or Red Hat account manager for more information.