SQL Server 2019 uses data virtualization to connect disparate data sources, enabling reporting and analytics without the need for an ETL process to assemble the data in a common data warehouse schema. Microsoft has integrated PolyBase with Big Data Cluster, enabling organizations to unify structured and unstructured data sources. With PolyBase, organizations can access data from Azure SQL Database, Azure SQL Data Warehouse, Oracle, Teradata, MongoDB, Azure Cosmos DB, and HDFS.
A key benefit of PolyBase for developers and data scientists is having one consistent user interface for accessing multiple data sources. T-SQL is used to access external table data, simplifying the creation of applications, reports, and analytics. PolyBase with Big Data Cluster connects multiple datastores into a broad data sphere, enabling a more comprehensive approach to data analysis.
In our Big Data Cluster, we distributed use case tables from the TPC-H benchmark across Oracle, MongoDB, and the SQL Server 2019 Big Data Cluster. The Oracle 19c database and MongoDB reside on two separate VMs of the VxRail system. Accessing the Oracle database and MongoDB with PolyBase demonstrates data virtualization and the ability to access data from disparate locations within the data center.