This paper provides an overview of SQL Server 2019 Big Data Cluster with VxRail HCI. It illustrates how developers and data scientists can benefit from using this data management and analytics platform with Docker containers, Kubernetes, vSphere, and SQL Server on Red Hat Enterprise Linux.
We have used these reference configurations with various sizes of big datasets to characterize and tune the cluster nodes, pods, and HDFS parameters to help you get a head start at deploying this solution. Using Intel Xeon Scalable processors, VxRail HCI allows customers to start small and grow by scaling up capacity and performance.
Our case study addresses big data storage and tools for handling big data. We split the eight tables of TPC-H data across multiple data sources—Big Data Cluster storage pool (HDFS), Big Data Cluster data pool (SQL Server), and stand-alone MongoDB database and Oracle database instances. Our SQL Server Solutions engineering team modified all 22 of the TPC-H queries so that tables were selected from different data sources. Using data virtualization with PolyBase, the queries were successful, running without error and returning the results that joined all four data sources.
Data virtualization does not involve physically copying and moving the data, so the data is available to business users in real time. Big Data Cluster simplifies and centralizes access to and analysis of the organization’s data sphere. It enables IT to simplify management by consolidating big data and data virtualization on one platform with a proven set of tools.
To help ensure an organization’s success on the digital transformation journey, IT needs a highly scalable infrastructure and service automation. We showed how the combination of Docker containers, Kubernetes, and the vSphere CSI plug-in enables fast and easy provisioning of the Big Data Cluster services. The initial installation of Big Data Cluster took approximately 3 hours. With automation, subsequent refreshes took less than 30 minutes.
In our use case, the key to this increase in deployment speed was the capability to seamlessly provision persistent storage on the VxRail system. Without automated storage provisioning, administration would have been required for all the Big Data Cluster services—the SQL Server master instance, data pool, compute pool, and storage pool.
Our testing shows that customers can virtualize Docker containers and achieve important benefits including the capability to securely isolate a Big Data Cluster instance on a VxRail system.