Data virtualization is used in SQL Server 2019 to connect disparate data sources, enabling reporting and analytics without having to ETL the data in a common data warehouse schema. Microsoft has integrated PolyBase with Big Data Clusters, enabling organizations to unify structured and unstructured data sources. With PolyBase, organizations can access data from Azure SQL Database, Azure SQL Data Warehouse, Oracle, Teradata, MongoDB, Azure Cosmos DB, and HDFS.
A key benefit of PolyBase for developers and data scientists is having one consistent user interface for accessing multiple data sources. T-SQL is used to access external table data, simplifying the creation of applications, reports, and analytics. PolyBase with Big Data Clusters connects multiple datastores into a broad data sphere, enabling a more comprehensive approach to data analysis.
In our Big Data Cluster, use case tables from the TPC-H benchmark are distributed across Oracle, SQL Server 2019, and the Big Data Cluster. The Oracle 19c database resides on a server outside the PowerFlex system. Accessing the Oracle database with PolyBase demonstrates data virtualization and the ability to access data from disparate locations within the data center.