Home > Workload Solutions > Data Analytics > Guides > Reference Architecture—Multicloud Data Analytics with Dell Technologies Powered by Starburst > Business challenges
The data landscape for most customers is complex. It involves collecting high volumes of data spread across different locations and business units, often using infrastructure spanning on-prem deployments, public clouds, or edge systems. In such scenarios, data silos emerge naturally, precluding an effective utilization of the diversity of data that exists across the company.
Data integration is the term used generally to refer to techniques that enable processing and joining such data present across different database and storage systems. Data federation is a technique that allows multiple databases to function as one. It consists of implementing a query engine that processes data across a variety of sources. Such data sources can be a relational database, a data warehouse, or a storage system holding data using open formats. They can exist in a single location, or be remote to each other, on-premises, or in the public cloud.
Data federation provides a new alternative to popular techniques such as Extract-Transform-Load (ETL) or Extract-Load-Transform (ELT). Data federation can process data in place and does not require any data movement. It reduces the number of data paths and the complexity of the overall data landscape. Time to insight shrinks as data movement prior to executing a query is no longer necessary. Users can explore data sets and join them with little effort, avoiding the natural silos that arise with multiple organizations and locations.
Data federation usually comes accompanied by additional techniques to address the performance penalty induced by remote data sources. The ability to create materialized views and cache both raw and intermediate data enables an efficient execution of production queries and exploration queries leveraging popular data sets. The extension of data federation to include such techniques and others is very often referred to as data virtualization.