SQL Server in containers: Dell EMC CSI plug-in—It's about manageability!
Mon, 30 Mar 2020 18:46:49 -0000|
Read Time: 0 minutes
A picture can be worth a thousand words, however, not every slide in a presentation is self-explanatory and sometimes even the speaker notes don’t provide enough real estate to cover the full meaning of the content. That happened to me recently with this slide in a technical presentation that I created:
The unanswered question was what does this sentence mean? - “Get fixes and upgrades faster as Dell EMC’s plug-in doesn’t require Kubernetes updates and upgrades!” I wrote this blog give more background and details about that statement. Before we can get to that, let’s discuss the value that the CSI plug-in has for customers using XtremIO X2 and VxRack FLEX. The CSI is a standard used by Dell EMC and other storage providers to provide an interface for container orchestration systems to expose storage services to containers. Thus, the CSI plug-in enables orchestration between containers and storage via Kubernetes. Other orchestration systems such as Mesos, Docker, and Cloud Foundry also use the same CSI specification for managing containers and storage together.
The CSI plug-in has another advantage for both orchestration systems (like Kubernetes) and the storage providers. For example, Kubernetes development can progress independently without requiring storage vendors to check code into the core Kubernetes repository. Similarly, the storage vendors update the CSI plug-in only when required and not with every update or upgrade of Kubernetes. Overall there is less complexity for both Kubernetes developers and storage vendors because the CSI plug-in simplifies the integration between the orchestration and storage layers. Thus, the CSI plug-in enables faster fixes and upgrades by Dell EMC to work with Kubernetes. I hope that answers the question from above. You can also take a look at this Kubernetes blog that goes into greater detail: Introducing Container Storage Interface (CSI) Alpha for Kubernetes.
We also recently wrote a white paper about SQL Server Containers that provides an overview of how the XtremIO X2 features available with our CSI plug-in can be used with SQL Server 2019 Linux containers . Here is a shortcut to the CSI plug-in overview in the paper. With the CSI plug-in, the Kubernetes administrator can:
- Dynamically provision and decommission volumes
- Attach and detach volumes from a host node
- Mount and unmount a volume from a host node
The Kubernetes administrator can even use the XtremIO X2 snapshot capabilities to provision a copy of the SQL Server. It’s these capabilities that really make automation and orchestration of SQL Server containers easier and faster. Want to learn more? The SQL Server Containers white paper is the right starting place because it takes you through the technology and shows how the XtremIO X2 CSI plug-in with Kubernetes and Docker can address traditional challenges.
Please rate this blog and provide us with ideas for future solutions. Thanks!
Related Blog Posts
New Dell EMC Ready Solution powers SQL Server, the complete performance platform
Mon, 30 Mar 2020 18:46:49 -0000|
Read Time: 0 minutes
Working on the new Dell EMC Ready Solution for SQL Server was like going from 0 to 60 mph in under 3 seconds. The exhilaration of being pushed into the seat as the road roars past in a blur is absolute fun. That’s what the combination of Dell EMC PowerEdge R840 servers and the new Dell EMC XtremIO X2 storage array did for us in our recent tests.
The classic challenge with most database infrastructures is diminishing performance over time. To use an analogy, it’s like gradually increasing the load a supercar must pull until its 0-to-60 time just isn’t impressive anymore. In the case of databases, the load is input/output operations per second (IOPS). As IOPS increase, response times can slow and database performance suffers. What is interesting is how this performance problem happens over time. As more databases are gradually added to an infrastructure, response times slow by a fraction at a time. These incremental hits on performance can condition application users to accept slower performance—until one day someone says, “Performance was good two years ago but today it’s slow.”
When reading about supercars, we usually learn about their 0-to-60 mph time and their top speed. While the top speed is interesting, how many supercars have you seen race by at 200+ mph? Top speeds apply to databases too. Perhaps you have read a third-party study that devoted a massive hardware infrastructure to one database, thereby showing big performance numbers. If only we had the budget to do that for all our databases, right? Top speeds are fun, but scalability is more realistic as most infrastructures will be required to support multiple databases.
Dell EMC Labs took the performance scalability approach in testing the new SQL Server architecture. Our goals were aggressive: Run 8 virtualized databases per server for a total of 16 databases running in parallel, with a focus on generating significant load while maintaining fast response times. To make the scalability tests more interesting, 8 virtualized databases used Windows Server Datacenter on one server and the other 8 databases used Red Hat Enterprise Linux on another server. Figure 1 shows the two PowerEdge R840 servers and the 8-to-1 consolidation ratio (on each server) achieved in the tests.
Figure 1: PowerEdge R840 servers
Quest Benchmark Factory was used to create the same TPC-E OLTP workload across all 16 virtualized databases. On the storage side, XtremIO X2 was used to accelerate all database I/O. The XtremIO X2 configuration included two X-Brick modules, each with 36 flash drives for a total of 72. According to the XtremIO X2 specification sheet, a 72-drive configuration can achieve 220,000 IOPS at .5 milliseconds (ms) of latency with a mixture of 70 percent reads and 30 percent writes using 8K blocks. Figure 2 shows the two X-Brick configuration of the X2 array with some of key features that make the all-flash system ideal for SQL Server databases.
Figure 2: XtremIO X2
Before we review the performance findings, let’s talk about IOPS and latency. IOPS is a measure that defines the load on a storage system. This measurement has greater context if we understand the maximum recommended IOPS for a storage system for a specific configuration. For example, 16 databases running in parallel don’t represent a significant load if they are only generating 20,000 IOPS. However, if the same databases generated 200,000 IOPS, as they did on the XtremIO X2 array that we used in our tests, then that’s a significant workload. Thus, IOPS are important in understanding the load on a storage system.
Response time and latency are used interchangeably in this blog and refer to the amount of time used to respond to a request to read or write data. Latency is our 0-to-60 metric that tells us how fast the storage system responds to a request. Just like with supercars, the lower the time, the faster the car and the storage system. Our goal was to determine if average read and write latencies remained under .5 ms.
Looking at IOPS and latency together brings us to our overall test objective. Can this SQL Server solution remain fast (low latency) under a heavy IOPS load? To answer this question is to understand if the database solution can scale. Scalability is the capability of the database infrastructure to handle increased workload with minimal impact to performance. The greater the scalability of the database solution, the more workload it can support and the greater return on investment it provides to customers. So, for our tests to be meaningful we must show a significant load; otherwise, the database system has not been challenged in terms of scalability.
We broke the achievable IOPS barrier of 220,000 IOPS by more than 55,000 IOPS! In large part, the PowerEdge R840 servers enabled the SQL Server databases to really push the OLTP workload to the XtremIO X2 array. We were able to simulate overloading the system by placing a load that is greater than recommended. In one respect we were impressed that XtremIO X2 supported more than 275,000 IOPS, but then we were concerned that there might have been a trade-off with performance.
The average latency for all physical reads and writes was under .5 ms. So not only did the SQL Server solution generate a large database workload, the XtremIO X2 storage system maintained consistently fast latencies throughout the tests. The test results show that this database solution was designed for performance scalability: The system maintained performance under a large workload across 16 databases. Figure 3 summarizes the test findings.
Figure 3: Summary of test findings
The capability to scale without having to invest in more infrastructure provides greater value to customers. Would I recommend pushing the new SQL Server solution past its limits like Dell EMC Labs did in testing for scalability? No. Running database tests involves achieving a steady state of performance that is uncharacteristic of real-world production databases. Production databases have peak processing times that must be planned for so that the business does not experience any performance issues. Dell EMC has SQL Server experts that can design the Ready Solution for different workloads. In my opinion, one of the key strengths of this solution is that each physical component can be sized to address database requirements. For example, the number of servers might need to be increased, but no additional investment is necessary on XtremIO X2, thus, saving the business money.
If I were to address just one other topic, I would pick the space savings achieved with a 1 TB SQL Server database. In figure 4, test results show a 3.52-to-1 data reduction ratio, which translates to a 71.5 percent space savings for a 1 TB database on the XtremIO X2 array. Always-on inline data reduction saves space by writing only unique blocks and then compressing those blocks to storage. The value of inline data reduction is the resulting ability to consolidate more databases to the XtremIO X2 array.
Figure 4: XtremIO X2 inline data reduction
Are you interested in learning how SQL Server performed on Windows Server Datacenter edition and Red Hat Enterprise Linux Server? I recommend reading the design guide for Dell EMC XtremIO X2 with PowerEdge R840 servers. The validation and use case section of that guide takes the reader through all the performance findings. Or schedule a meeting with your local Microsoft expert at Dell EMC to explore the solution.
Why Ready Solutions for Microsoft SQL?
The Ready Solutions for Microsoft SQL Server team at Dell EMC is a group of SQL Server experts who are passionate about building database solutions. All of our solutions are fully integrated, validated, and tested. Figure 5 shows how we approach developing database solutions. Many of us have been on the customer or consulting side of the business, and these priorities reflect our passion to develop specialized database solutions that are faster and more reliable.
Figure 5: Our database solutions development approach
I hope you enjoyed this blog. If you have any questions, please contact me.
Manage and analyze humongous amounts of data with SQL Server 2019 Big Data Cluster
Thu, 07 May 2020 18:50:24 -0000|
Read Time: 0 minutes
A collection of facts and statistics for reference or analysis is called data, and, in a way, the term “big data” is a large sum of data. The big data concept has been around for many years, and the volume of data is growing like never, which is why data is a hugely valued asset in this connected world. Effective big data management enables an organization to locate valuable information with ease, regardless of how large or unstructured the data is. The data is collected from various sources including system logs, social media sites, and call detail records.
The four V's associated with big data are Volume, Variety, Velocity, and Veracity:
- Volume is about the size—how much data you have.
- Variety means that the data is very diﬀerent—that you have very diﬀerent types of data structures.
- Velocity is about the speed of how fast the data is getting to you.
- Veracity, the final V, is a diﬃcult one. The issue with big data is that it is very unreliable.
SQL Server Big Data Clusters make it easy to manage this complex assortment of data.
You can use SQL Server 2019 to create a secure, hybrid, machine learning architecture starting with preparing data, training a machine learning model, operationalizing your model, and using it for scoring. SQL Server Big Data Clusters make it easy to unite high-value relational data with high-volume big data.
Big Data Clusters bring together multiple instances of SQL Server with Spark and HDFS, making it much easier to unite relational and big data and use them in reports, predictive models, applications, and AI.
In addition, using PolyBase, you can connect to many different external data sources such as MongoDB, Oracle, Teradata, SAP HANA, and more. Hence, SQL Server 2019 Big Data Cluster is a scalable, performant, and maintainable SQL platform, data warehouse, data lake, and data science platform that doesn’t require compromising between cloud and on-premises. Components include:
The controller provides management and security for the cluster. It contains the control service, the configuration store, and other cluster-level services such as Kibana, Grafana, and Elastic Search.
The compute pool provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute pool are divided into SQL compute instances for specific processing tasks.
The data pool is used for data persistence and caching. The data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server Big Data Cluster data marts are persisted in the data pool.
The storage pool consists of storage pool pods comprising SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server Big Data Cluster are members of an HDFS cluster.
Following is the reference architecture of SQL Server 2019 on Big Data Cluster:
Big data analysis
Data analytics is the science of examining raw data to uncover underlying information. The primary goal is to ensure that the resulting information is of high data quality and accessible for business intelligence as well as big data analytics applications. Big Data Clusters make machine learning easier and more accurate by handling the four Vs of big data:
The impact of the Vs on analytics
How a Big Data Cluster helps
The greater the volume of data processed by a machine learning algorithm, the more accurate the predictions will be.
Increases the data volume available for AI by capturing data in scalable, inexpensive big data storage in HDFS and by integrating data from multiple sources using PolyBase connectors.
The greater the variety of different sources of data, the more accurate the predictions will be.
Increases the number of varieties of data available for AI by integrating multiple data sources through the PolyBase connectors.
Real-time predictions depend on up to-date data flowing quickly through the data processing pipelines.
Increases the velocity of data to enable AI by using elastic compute and caching to speed up queries.
Accurate machine learning depends on the quality of the data going into the model training.
Increases the veracity of data available for AI by sharing data without copying or moving data, which introduces data latency and data quality issues. SQL Server and Spark can both read and write into the same data files in HDFS.
Azure Data Studio is the tool that data engineers, data scientists, and DBAs use to manage databases and write queries. Cluster admins use the admin portal, which runs as a pod inside the same namespace as a whole cluster and provides information such as status of all pods and overall storage capacity.
Azure Data Studio is a cross-platform management tool for Microsoft databases. It’s like SQL Server Management Studio on top of the popular VS Code editor engine, a rich T-SQL editor with IntelliSense and plug-in support. Currently, it’s the easiest way to connect to the different SQL Server 2019 endpoints (SQL, HDFS, and Spark). To do so, you need to install Data Studio and the SQL Server 2019 extension.
If you have a Kubernetes infrastructure, you can deploy this with a single server cluster in single command and have a cluster in about 30 minutes.
If you want to install SQL Server 2019 Big Data Cluster on your on-premises Kubernetes cluster, you can find an official deployment guide for Big Data Clusters on Minikube in Microsoft docs.
Planning is everything and good planning will get a lot of problems out of the way, especially if you are thinking about streaming data and real-time analytics.
When it comes to technology, organizations have many different types of big data management solutions to choose from. Dell Technologies solutions for SQL Server help organizations achieve some of the key benefits of SQL Server 2019 Big Data Clusters:
- Insights to everyone: Access to management services, an admin portal, and integrated security in Azure Data Studio, which makes it easy to manage and create a unified development and administration experience for big data and SQL Server users
- Enriched data: Data using advanced analytics and artificial intelligence that’s built into the platform
- Overall data intelligence:
- Unified access to all data with unparalleled performance
- Easily and securely manage data (big/small)
- Build intelligent apps and AI with all data
- Management of any data, any size, anywhere: Simplified management and analysis through unified deployment, governance, and tooling
- Easy deployment and management of using Kubernetes-based big data solution built in to SQL Server
To make better decisions and to gain insights from data, large, small, and medium-size enterprises use big data analysis. For information about how the SQL solutions team at Dell help customers store, analyze, and protect data with Microsoft SQL Server 2019 on Big Data Cluster technologies, see the following links: