This section describes how we configured Oracle Big Data SQL to access ONDB, Cloudera Hadoop, and Microsoft SQL Server. For information about the installation steps for these data management systems and how data was loaded, see the Oracle Big Data SQL on PowerFlex Design Guide.
After the Oracle Big Data SQL platform was running, we populated the data sources with data from the TPC-H benchmark data generation tool. The TPC-H benchmark models a decision support workload characterized by long running ad hoc queries using data selected from mixture of large and medium-sized tables. We used a scale factor of 3,000 with the dbgen toolkit to generate approximately 3.5 TB of data across eight tables. The following table shows placement of the tables across the data management applications, table name, approximate data size, and indicates the size of the table:
Table 12. Decision support tables across data sources
Database source |
Table name |
Data size (GB) |
Size of table |
Oracle |
ORDERS |
595 |
Medium |
Oracle |
PART |
75 |
Medium |
ONDB |
NATION |
2 |
Small |
ONDB |
REGION |
3 |
Small |
Cloudera Hadoop |
SUPPLIERS |
10 |
Small |
Cloudera Hadoop |
LINEITEM |
2,400 |
Large |
SQL Server |
CUSTOMER |
75 |
Medium |
SQL Server |
PARTSUPP |
400 |
Medium |
|
|
|
|
Total |
3,560 |
|