The world has moved from one of selective, focused data collection, primarily driven by structured transactional data housed in data warehouses. The new model is one where every aspect of the enterprise, market, and communities can be captured and cataloged, regardless of its immediate value. The cost for data storage has plummeted and the ability to extract value in the future continues at a rapid pace. These events often lead to data being extracted and stored without any particular cognizance of its ultimate value.
Recent developments in workloads like machine learning and artificial intelligence, however, require such large amounts of potentially disparate data to derive their insight, and are rapidly revealing that value. But new types and sources of data being generated now, not to mention new types of applications, will create pressure to build new solutions to act on that data.
The rapid development of public cloud systems and services has complicated efforts to develop these new systems. Data repositories, query engines, and analysis tools in the cloud have been highly attractive for their ease of adoption and utility cost model. These capabilities continue to evolve. Organizations that need to take full advantage of new data models and quantities have continued to build and operate on-premises systems.
More organizations embrace digital transformation, data transformation and increase their reliance on analytics to help guide their actions. Data lakes have been growing in popularity increasingly deployed both on premises and in the cloud. The data warehouses of the past relied heavily on structured data and focused narrowly on business intelligence or decision support. As new data sources from logs, sensors, equipment, and industrial IoT became available, a different repository was required. This new breed of data is largely either unstructured or semistructured, making it more difficult for applications to consume it directly without some intermediate resolution layer. Data lakes are optimized for large-scale collection of both tabular and nontabular information, and they can store both raw and transformed data. This approach provides great application flexibility, and promoting large-scale analysis of diverse data from multiple sources for greater insights.
As more unstructured data is now being brought into analytics environments, a new organizational approach is needed to simplify the collection and utilization of this information. A more modern data center must be enabled to support a transformation in how data is collected, managed, and used. The potential complexity of this change and these new environments creates adoption and management challenges for IT under pressure to implement this new paradigm reliably and rapidly. The data scientists and data engineers who are tasked to convert information into more and deeper insights are under pressure to monetize the data.
Data collection, storage, and analysis capabilities in the cloud entails both on-demand usage and nearly infinite capacity. These factors have driven substantial data center development. The modern data center must operate on a similar self-service, scalable basis. Doing so closes gaps in time and process between resources and the developers and expert users who need them to drive insights and value. It is a business imperative now to shorten development cycles and give greater control to knowledge workers and strategic implementers. This imperative must not cause the organization to lose control of access, security, data integrity, or costs.