The Oracle 19c database includes many SQL Analytics capabilities like data wrangling functions, advanced aggregations, pattern matching, and more. These analytic functions are implemented through SQL statement syntax that most Oracle developers can easily use from experience. Using Oracle’s in-database SQL Analytics reduces the effort of learning analytical functions so developers can achieve data insights in less time.
However, for enterprises that want to perform analysis across data sources like Apache Hadoop, Apache Kafka, object stores, and NoSQL databases, Oracle Big Data SQL is uniquely positioned to accelerate application development by taking advantage of virtualization. The proposition is even stronger for enterprises that have a wealth of SQL and PL/SQL experience, as Oracle Big Data SQL can leverage that knowledge for data access across all these data sources.
Big Data SQL uses access drivers to connect to data sources like:
Oracle external tables allow access to data outside the Oracle database by specifying attributes including access parameters and datatype parameters. After an external table has been created, developers can access the external data with the same patterns as if it were in a table stored in the Oracle database. The TYPE attributes specify the source of the external data like ORACLE_HDFS or ORACLE_HIVE. The ACCESS parameters provide the metadata to locate data and generate tables. Other features of Oracle external tables include:
In this solution, we used Oracle Big Data SQL to connect to Hadoop using the ORACLE_HDFS access driver and ONDB using the NoSQL client, and to Microsoft SQL Server using Oracle Gateways. To test the data virtualization capabilities, we used data from the TPC-H decision workload benchmark. The tables were distributed across all the source data sources by assigning a subset of tables to each source. In the final test, we used Oracle Big Data SQL to query the decision support tables using standard queries that were modified to use external tables.
As part of the data virtualization tests, we loaded two tables, ORDERS and PART, in the Oracle database. These medium-sized tables enabled us to test combining local data in the Oracle database with external data from data sources like Cloudera Hadoop, ONDB, and Microsoft SQL Server.