This section broadly describes how we installed, configured, and tested Big Data Clusters. Rather than including step-by-step instructions, we focus on the key elements of our successful Big Data Cluster implementation and provide links to documents that detail the installation requirements.
Based on the working assumption that virtualization is an existing part of the data center, we do not address VMware vSphere installation in this section. Many of the steps apply to bare metal and virtualization; thus, if the organization is considering bare metal, the steps still apply.
The high-level steps for installing, configuring, and testing this solution are as follows. The subsequent sections provide additional details.
- Install Docker, Kubernetes, and the CSI plug-in for PowerFlex systems:
- Install the Docker Enterprise Edition in a VM on the PowerFlex system. The operating system within the VM is Red Hat Linux Enterprise.
- Install and configure a local private Docker registry.
- Install Kubernetes.
- Install the CSI plug-in for PowerFlex systems.
- Deploy the Big Data Cluster:
- Pull the most recent SQL Server 2019 Big Data Cluster for Linux container image from the Microsoft Container Registry.
- Push the SQL Server 2019 Big Data Cluster to the image in the local private registry.
- Configure vSphere dynamic storage provisioning.
- Install the Big Data Cluster.
- Import the TPC-H data into the Big Data Cluster.
- Test data virtualization.
- Migrate data to the data pool by ingesting data from the stand-alone SQL Server database.
- Ingest 10 TB of TPC-H data into the Big Data Cluster.