Business challenge

Running SQL Server 2022 with Red Hat OpenShift on AMD EPYC based Dell PowerEdge servers and Dell ObjectScale

Introduction Business challenge

Business challenge

Solution overview Solution architecture Querying Delta Lake from SQL Server Conclusion References

Thank you for your feedback!

The standard of data processing has shifted away from structured transactional data stored in data warehouses as the primary method of collecting data. These legacy data warehouses have been slowly phased out by many organizations moving toward complete digital data management.
There are now more types of data like financial, logs, sensor, audio, video, IoT, and more. These datatypes are semi-structured or unstructured, which makes it difficult for applications to directly ingest the data without pre-processing it.
Data virtualization is a technology that enables organizations to access and manipulate data from multiple disparate sources as if it were coming from a single unified source. This process works by creating a logical layer of abstraction between data consumers and the underlying data sources.
Data virtualization has become popular among large enterprises because unstructured and semi-structured data is common and processing that data is challenging. Marketing leaders across industries expect the data to be available for easy consumption to speed up their decision-making process.
Data analytics is a rapidly evolving field that plays a crucial role in today’s data-driven era. The processes of data analytics involve the exploration, interpretation, and extraction of valuable insights from raw, structured, and unstructured data to identify trends and perform analysis to answer business questions. This type of analysis allows businesses to gain a deeper understanding of data to derive insights and communicate meaningful data patterns within an organization to improve business outcomes.
There are several approaches to data analytics, including the following:
- Descriptive
- Diagnostic
- Predictive
- Prescriptive
Because data analytics is business critical, using the right infrastructure to run such workloads not only speeds up the analytics process, but it also provides a robust architecture to prevent any unplanned outages. Data analytics infrastructure has to be powerful, highly available (HA), flexible, and elastic.
In today’s multicloud era, deploying infrastructure for data analytic workloads does not require the traditional three-tier architecture. The data analytic platform needs to be flexible and elastic so it can be deployed on-premises, in a public cloud, or in a hybrid cloud. However, successfully running data analytic workloads requires carefully planning with all business stakeholders.
Running data analytic workloads against unstructured and semi-structured data residing on S3-compatible storage requires a robust and highly durable object storage. This object storage needs to be flexible and elastic and have the ability to be deployed anywhere in the cloud or on-premises.
Another critical component to running data analytic workloads efficiently is the CPU. Data analytic workloads can be CPU intensive and selecting the optimal CPUs for the data analytic servers can be challenging, time consuming, and expensive. The three common CPU components that should be considered when choosing a CPU include:
- Number of cores per socket
- Frequency of the cores
- L3 cache size

Your Browser is Out of Date

Business challenge

Business challenge