The infrastructure footprint for this Oracle Big Data SQL solution used a total of nine servers: two PowerEdge R640 servers were dedicated to Oracle and Microsoft SQL Server, four nodes consisted of the PowerFlex storage, and three nodes were dedicated to the PowerFlex controller services.
Table 4. Oracle Big Data SQL solution architecture
Component |
Infrastructure or software version |
Notes |
Oracle Big Data SQL 4.1 |
PowerEdge R640 compute node |
Dedicated PowerEdge server |
Microsoft SQL Server 2019 Enterprise Evaluation Edition (64-bit) |
PowerEdge R640 compute node |
Dedicated PowerEdge server |
Cloudera Hadoop 6.2.1 and Hive 2.1 |
PowerFlex nodes |
Used 3 of the 4 PowerFlex nodes |
Oracle NoSQL 19.5 |
PowerFlex node |
Used 1 of the 4 PowerFlex nodes |
VMware vSphere 7.0 |
VMware ESXi 7.0 with vCenter 7.0 |
Installed on the PowerEdge servers and PowerFlex nodes |
Dell EMC PowerFlex storage |
Flex OS 3.5.1 |
4 x R840 nodes |
Dell EMC PowerFlex controllers |
Flex OS 3.5.1 |
3 x R640 nodes |
Two operating systems were used: Oracle, Hadoop, and ONDB used Oracle Linux 7.6 with kernel 4.14.35-1818.3.3.el7uek.x86_64. Microsoft SQL Server was deployed on Windows Server 2019 Datacenter Evaluation Edition.
The following table details the virtual machine configuration for each application in the Oracle Big Data SQL software stack:
Table 5. VM configuration: vCPU and vMem details
|
vCPU |
vMEM |
|
|
Number of vCPUs |
Total (GB) |
Limit (MB) |
Oracle |
8 |
64 |
Unlimited |
ONDB |
5 |
48 |
Unlimited |
Hadoop Primary |
8 |
64 |
Unlimited |
Hadoop Data node 1 |
10 |
64 |
Unlimited |
Hadoop Data node 2 |
10 |
64 |
Unlimited |
SQL Server |
8 |
64 |
Unlimited |
The primary goal of the setup was to demonstrate data virtualization functionality; therefore, the virtual machines were not sized for performance. The sizing in this table reflects a starting point for the initial deployment of the VMs for each application in the stack. Monitoring your Oracle Big Data SQL deployment provides more information that can be used to fine-tune vCPU and vMemory allocations, allowing you to optimize the resources for data virtualization across the disparate data silos.