A data silo is a repository of information that is isolated from other data that is collected by an organization. Data siloing occurs for many reasons, including:
Data silos are inefficient for purposes of data-based decision making; you cannot make an informed decision using only a siloed subset of the data.
Data warehouses are repositories that combine disparate data sources, such as data silos, with the intention of enabling better reporting, data analysis, and business intelligence. The introduction of data warehouses improved data analysis so that organizations could make better decisions; however, data warehouses merely combine data silos and do not resolve data processing issues.
The creation of data lakes was the next attempt to solve the data silo issue. In a data lake, streams of unmodified data flow into a shared repository. Data lakes support all datatypes and enable the entire organization to analyze and report on the data. However, the data in the data lakes is not harmonized, indexed, searchable, or easily usable (see Data Lakes vs Data Hubs vs Federation: Which One is Best?). Thus, data warehouses and data lakes have not eliminated data silos.
Data hubs, which are collections of data from multiple sources, resolve the inefficiencies of isolated data by enabling access to disparate data repositories. Unlike data lakes, data hubs support discovery, indexing, and analytics, and provide a centralized, unified data source to serve diverse business needs. A data hub is organized and structured for distributing, sharing, and subsetting. A data hub is based on hub and spoke architecture, which is intended to integrate and distribute data by physically moving it and reindexing it into a new system. Accessing harmonized data, data hubs build and maintain indexes that provide easy and efficient access to data warehouses and data lakes from a centralized location.
In this white paper, we look at structured data from Oracle and SQL Server databases on Dell EMC VxBlock System 1000 as part of a large data hub implementation. Oracle and SQL Server are relational database applications that are used for storing structured data. These databases support enterprise resource planning (ERP) systems that combine the data from multiple business functions, such as accounting, marketing, and so on, into a large enterprise data hub that customers and data scientists can access. An enterprise data hub is a big data management model that is designed to handle the large volume of data that is stored in distributed data lakes and in the central data repository. The goal of an enterprise data hub is to integrate data, orchestrate data processing, and manage metadata across multiple enterprise data sources. Data hubs feed powerful data pipelines that manage, share, and distribute data, providing diverse business users with the information that they need to do their jobs.
In testing this database platform, we created a mixed database environment using Oracle and SQL Server and a mixed workload environment including OLTP and OLAP workloads. This combination of different databases and workloads simulates that which a customer might encounter during a database consolidation effort. This white paper demonstrates how converged infrastructure (CI) solutions such as VxBlock 1000 enable businesses to consolidate mixed applications and workloads on one unified platform that can support an enterprise data hub.