The world is entering a new era of innovation. Data from sources including sensors, physical equipment, and software applications populate a proliferation of information silos that must be managed properly. The growing number of data sources associated with the Internet of Things and multi-cloud systems increase the need for robust distributed data management solutions.
Enterprises make significant investments to develop data management strategies using data warehouses, data marts, data lakes, and so on. Their goal is to improve how they integrate and analyze data. It is increasingly clear that rapid growth in the volume, velocity, and variety of available data is straining traditional strategies for data management and analysis. These challenges include:
As the volume, velocity, and variety of data sources increase, the agile enterprise must be able to mine value within a short period of time. The compute and storage resources required to maintain the current generation of data warehouses and data lakes have increased to the point where these traditional approaches are not practical. To address these evolving data management challenges, a solution that enables accessing data in its native repository has significant advantages, including:
Addressing these challenges allows developers and data scientists to spend more time mining data and developing insights for the enterprise.
The concept of data virtualization as a solution for data management scale and data movement latency issues has been discussed for years. Whereas server virtualization is the ability to support a heterogenous operating system on a central server, data virtualization is a logical data layer that integrates all enterprise data siloed across disparate systems. The advantage of data virtualization includes the ability to manage the unified data layer with centralized security and governance, and the ability to deliver data access to business users across disparate data repositories and in real time.
Data virtualization solutions that can easily access disparate data repositories help organizations to quickly integrate with new data sources as they are created across the edge, core, and cloud locations. These solutions offer the best option for broadening the value of data silos by enabling customers to gain insights. Thus, the capability to quickly integrate new data sources accelerates the enterprise’s ability to move from information to actionable insights.
Just as server virtualization uses a hypervisor layer to hide the complexity of hosting resources using disparate operating systems on a common platform, data virtualization uses a “data visor” to hide the complexity of working with many data sources. When data repositories are connected through a common virtualized access and management layer, this data visor layer will include a library of access drivers for connecting to common data repositories. The core value provided to the enterprise is the ability to integrate, present data, and enable analysis across different repositories with less end-user and developer complexity.
This design guide and the associated white paper will show how Oracle Big Data SQL on Dell Technologies infrastructure advances the use of enterprise data virtualization technology. As part of the Oracle data virtualization journey, you will learn how Oracle Big Data SQL was installed and configured in the Dell Technologies solutions evaluation lab. We also discuss how the infrastructure components were chosen to support this solution. In the final phase of our evaluation testing, we document how a TPCH decision support data set can be imported into different management repositories that you can replicate to experience the value of Oracle’s Big Data SQL data virtualization approach. This design guide, together with the associated technical white paper, will enable your enterprise to accelerate the data virtualization journey using Oracle Big Data SQL on Dell Technologies infrastructure.