Home Integrated Products VxRail Blogs

Big Solutions on Dell EMC VxRail with SQL 2019 Big Data Cluster

Mon, 17 Aug 2020 18:31:31 -0000

Read Time: 0 minutes

Vic Dery

The amount of data and different formats organizations must manage, ingest, and analyze has been the driving force behind Microsoft SQL 2019 Big Data Clusters (BDC). SQL Server 2019 BDC demonstrates the deployment of scalable clusters of SQL Server, Spark, and containerized HDFS (Hadoop Distributed File System) running on Kubernetes.

We recently deployed and tested SQL Server 2019 BDC on Dell EMC VxRail hyperconverged infrastructure to demonstrate how VxRail delivers the performance, scalability, and flexibility needed to bring these multiple workloads together.

The Dell EMC VxRail platform was selected for its ability to incorporate compute, storage, virtualization, and management in one platform offering. The key feature of the VxRail HCI is the integration of vSphere, vSAN, and VxRail HCI System Software for an efficient and reliable deployment and operations experience. The use of VxRail with SQL Server 2019 BDC makes it easy to unite relational data with big data.

The testing demonstrates the advantages of using VxRail with SQL Server 2019 BDC for analytic application development. This also demonstrates how Docker, Kubernetes, and the vSphere Container Storage Interface (CSI) driver accelerate the application development life cycle when they are used with VxRail. The lab environment for development and testing used four VxRail E560F nodes supported by the vSphere CSI driver. With this solution, developers can provision SQL Server BDC in containerized environments without the complexities of traditional methods for installing databases and provisioning storage.

Our white paper, Microsoft SQL Server 2019 Big Data Cluster on Dell EMC VxRail shows the power of implementing SQL Server 2019 BDC technologies on VxRail. Integrating SQL Server 2019 RDBMS, SQL Server BDC, MongoDB, and Oracle RDBMS helps to create a unified data analytics application. Using VxRail enhances the ability of SQL Server 2019 to scale out storage and compute clusters while embracing the virtualization techniques from VMware. This SQL Server 2019 BDC solution also benefits from the simplicity of a complete yet flexible validated Dell EMC VxRail with Kubernetes management and storage integration.

The solution demonstrates the combined value of the following technologies:

VxRail E560F – All-flash performance
Large tables stored on a scaled-out HDFS storage cluster that is hosted by BDC
Smaller related data tables that are hosted on SQL Server, MongoDB, and Oracle databases
Distributed queries that are enabled by the PolyBase capability in SQL Server 2019 to process Transact-SQL queries that access external data in SQL Server, Oracle, Teradata, and MongoDB.
Red Hat Enterprise Linux

Big Data Cluster Services

This diagram shows how the pools are built. It provides details of the benefits for Kubernetes features for container orchestration at scale, including:

Autoscaling, replication, and recovery of containers
Intracontainer communication, such as IP sharing
A single entity—a pod—for creating and managing multiple containers
A container resource usage and performance analysis agent, cAdvisor
Network pluggable architecture
Load balancing
Health check service

This white paper, Microsoft SQL Server 2019 Big Data Cluster on Dell EMC VxRail, addresses big data storage, the tools for handling big data, and the details around testing with TPC-H. When we tested data virtualization with PolyBase, the queries were successful, running without error and returning the results that joined all four data sources.

Because data virtualization does not involve physically copying and moving the data (so that the data is available to business users in real-time), BDC simplifies and centralizes access to and analysis of the organization’s data sphere. It enables IT to manage the solution by consolidating big data and data virtualization on one platform with a proven set of tools.

Success starts with the right foundation:

SQL Server 2019 BDC is a compelling new way to utilize SQL Server to bring high-value relational data and high-volume big data together on a unified, scalable data platform. All of this can be deployed with VxRail, enabling enterprises to experience the power of PolyBase to virtualize their data stores, create data lakes, and create scalable data marts in a unified, secure environment without needing to implement slow and costly Extract, Transform, and Load (ETL) pipelines. This makes data-driven applications and analysis more responsive and productive. SQL Server 2019 BDC and Dell EMC VxRail provide a complete unified data platform to deliver intelligent applications that can help make any organization more successful.

Read the full paper to learn more about how Dell EMC VxRail with SQL 2019 Big Data Clusters can:

Bring high-value relational data and high-volume big data together on a single, scalable platform.
Incorporates intelligent features and gets insights from more of your data—including data stored beyond SQL Server in Hadoop, Oracle, Teradata, and MongoDB.
Supports and enhances your database management and data-driven apps with advanced analytics using Hadoop and Spark.

Additional VxRail & SQL resources:

Microsoft SQL Server 2019 Big Data Cluster on Dell EMC VxRail

Microsoft SQL Server on VMware Cloud Foundation on Dell EMC VxRail

SQL on VxRail Solution brief

Key Benefits of Running Microsoft SQL Server on Dell EMC hyperconverged infrastructure (HCI) - Whitepaper

Key benefits of running Microsoft SQL Server on Dell EMC Hyperconverged Infrastructure (HCI) - Infographic

Architecting Microsoft SQL Server on VMware vSphere

Author: Vic Dery, Senior Principal Engineer, VxRail Technical Marketing

@VxVicTX

Tags:

Controller	The controller provides management and security for the cluster. It contains the control service, the configuration store, and other cluster-level services such as Kibana, Grafana, and Elastic Search.
Compute pool	The compute pool provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute pool are divided into SQL compute instances for specific processing tasks.
Data pool	The data pool is used for data persistence and caching. The data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server Big Data Cluster data marts are persisted in the data pool.
Storage pool	The storage pool consists of storage pool pods comprising SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server Big Data Cluster are members of an HDFS cluster.

	The impact of the Vs on analytics	How a Big Data Cluster helps
Volume	The greater the volume of data processed by a machine learning algorithm, the more accurate the predictions will be.	Increases the data volume available for AI by capturing data in scalable, inexpensive big data storage in HDFS and by integrating data from multiple sources using PolyBase connectors.
Variety	The greater the variety of different sources of data, the more accurate the predictions will be.	Increases the number of varieties of data available for AI by integrating multiple data sources through the PolyBase connectors.
Velocity	Real-time predictions depend on up to-date data flowing quickly through the data processing pipelines.	Increases the velocity of data to enable AI by using elastic compute and caching to speed up queries.
Veracity	Accurate machine learning depends on the quality of the data going into the model training.	Increases the veracity of data available for AI by sharing data without copying or moving data, which introduces data latency and data quality issues. SQL Server and Spark can both read and write into the same data files in HDFS.

Your Browser is Out of Date

Big Solutions on Dell EMC VxRail with SQL 2019 Big Data Cluster

Related Blog Posts

Dell Technologies partners with Microsoft and Red Hat running SQL Server Big Data Clusters on OpenShift

Manage and analyze humongous amounts of data with SQL Server 2019 Big Data Cluster

Big data analysis

Cluster management

Conclusion