Oracle Big Data SQL version 4.1

Thank you for your feedback!

The Oracle 19c database includes many SQL Analytics capabilities like data wrangling functions, advanced aggregations, pattern matching, and more. These analytic functions are implemented through SQL statement syntax that most Oracle developers can easily use from experience. Using Oracle’s in-database SQL Analytics reduces the effort of learning analytical functions so developers can achieve data insights in less time.
However, for enterprises that want to perform analysis across data sources like Apache Hadoop, Apache Kafka, object stores, and NoSQL databases, Oracle Big Data SQL is uniquely positioned to accelerate application development by taking advantage of virtualization. The proposition is even stronger for enterprises that have a wealth of SQL and PL/SQL experience, as Oracle Big Data SQL can leverage that knowledge for data access across all these data sources.
Big Data SQL uses access drivers to connect to data sources like:
- Apache Hive using the ORACLE_HIVE access driver—Apache Hive is a distributed data warehouse system designed to enable analytics across petabytes of data using SQL. The ORACLE_HIVE driver enables the creation of Oracle external tables that reference Apache Hive data sources.
- Hadoop Distributed File System (HDFS) using ORACLE_HDFS access driver—HDFS is a distributed file system designed for high throughput. It accesses data and performs analytics across large datasets. The ORACLE_HDFS driver enables the creation of external tables in the Oracle database that can directly access files stored in HDFS.
- Object storage using ORACLE_BIGDATA access driver—An object store saves and manages data as objects rather than in file systems or block storage. The ORACLE_BIGDATA driver enables the creation of external tables to query data in object stores. The same access driver also supports access to text, Parquet, and Avro file types.
Oracle external tables allow access to data outside the Oracle database by specifying attributes including access parameters and datatype parameters. After an external table has been created, developers can access the external data with the same patterns as if it were in a table stored in the Oracle database. The TYPE attributes specify the source of the external data like ORACLE_HDFS or ORACLE_HIVE. The ACCESS parameters provide the metadata to locate data and generate tables. Other features of Oracle external tables include:
- Automatic mapping of the HCatalog to Oracle tables—The HCatalog is a Hadoop tool that maps tables in the Hive metastore to a repository. Automatic mapping of Hive metadata to external Oracle tables accelerates development using data virtualization.
- Many-to-one mappings of Hadoop clusters to one Oracle database—With Oracle Big Data SQL, the enterprise can access multiple Hadoop clusters using a single Oracle database.
In this solution, we used Oracle Big Data SQL to connect to Hadoop using the ORACLE_HDFS access driver and ONDB using the NoSQL client, and to Microsoft SQL Server using Oracle Gateways. To test the data virtualization capabilities, we used data from the TPC-H decision workload benchmark. The tables were distributed across all the source data sources by assigning a subset of tables to each source. In the final test, we used Oracle Big Data SQL to query the decision support tables using standard queries that were modified to use external tables.
As part of the data virtualization tests, we loaded two tables, ORDERS and PART, in the Oracle database. These medium-sized tables enabled us to test combining local data in the Oracle database with external data from data sources like Cloudera Hadoop, ONDB, and Microsoft SQL Server.

Your Browser is Out of Date

Oracle Big Data SQL version 4.1

Oracle Big Data SQL version 4.1