Real-Time Streaming Solutions Beyond Data Ingestion
Wed, 16 Dec 2020 22:31:30 -0000|
Read Time: 0 minutes
So, it has been all about data—data at rest, data in-flight, IoT data, and so forth. Let’s touch base on the traditional data processing approaches and look at their synergy with modern database technologies. Users’ model-based inquiries manifest to a data entity that is created upon initiation of the request payloads. Traditional database and business applications have been the lone actors that collaborated to provide implementations of such data models. They interact in processing of the users’ inquiries and persisting the results in static data stores for further updates. The business continuity is measured by a degree of such activities among business applications consuming data from these shared data stores. Of course, with a lower degree of such activities, there exists a high potential for the business to be at idle states of operations caused by waiting for more data acquisitions.
The above paradigm is inherently set to potentially miss a great opportunity to maintain a higher degree of business continuity. To fill these gaps, a shift in the static data store paradigm is necessary. The new massive ingested data processing requirements mandate the implementation of processing models that continuously generate insight from any “data in-flight,” mostly in real time. To overcome storage access performance bottlenecks, persisting the interim computed results in a permanent data store is expected to be kept at a minimal level.
This blog addresses these modern data processing models from a real-time streaming ingestion and processing perspective. In addition, it discusses Dell Technologies’ offerings of such models in detail.
Customers have an option of building their own solutions based on the open source projects for adopting real-time streaming analytics technologies. The mix and match of such components to implement real-time data ingestion and processing infrastructures is cumbersome. It requires a variety of costly skills to stabilize such infrastructures in production environments. Dell Technologies offers validated reference architectures to meet target KPIs on storage and compute capacities to simplify these implementations. The following sections provide high-level information about real-time data streaming and popular platforms to implement these solutions. This blog focuses particularly on two Ready Architecture solutions from Dell—Streaming Data Platform (formerly known as Nautilus) and a Real-Time Streaming reference architecture based on Confluent’s Kafka ingestion platform—and provides a comparative analysis of the platforms.
Real-time data streaming
The topic of real-time data streaming goes far beyond ingesting data in real time. Many publications clearly describe the compelling objectives behind a system that ingests millions of data events in real time. An article from Jay Kreps, one of the co-creators of open source Apache Kafka, provides a comprehensive and in-depth overview of ingesting real-time streaming data. This blog focuses on both ingestion and the processing side of the real-time streaming analytics platforms.
Real-time streaming analytics platforms
A comprehensive end-to-end big data analytics platform demands must-have features that:
- Simplify the data ingestion layer
- Integrate seamlessly with other components in the big data ecosystem
- Provide programming model APIs for developing insight-analytics applications
- Provide plug-and-play hooks to expose the processed data to visualization and business intelligence layers
Over the past many years, demand for real-time ingestion features have created motivations for implementing several streaming analytics engines, each with a unique targeted architecture. Streaming analytics engines provide capabilities ranging from micro-batching the streamed data during processing to a near-real-time performance to a true-real-time processing behavior. The ingested datatype may range from a byte-stream event to a complex event format. Examples of such data size ingestion engines are Dell Technologies supported Pravega and open source Apache 2.0 Kafka that can be seamlessly integrated with open source big data analytics engines such as Samza, Spark, Flink, and Storm, to name a few. Proprietary implementations of similar technologies are offered by a variety of vendors. A short list of these products includes Striim, WSO2 Complex Event processor, IBM Streams, SAP Event Stream Processor, and TIBCO Event Processing.
Real-time streaming analytics solutions: A Dell Technologies strategy
Dell Technologies offer customers two solutions to implement their real-time streaming infrastructure. One solution is built on Apache Kafka as the ingestion layer and Kafka Stream Processing as the default streaming data processing engine. The second solution is built on open source Pravega as the ingestion layer and Flink as the default real-time streaming data processing engine. But how are these solutions being used in response to customers’ requirements? Let’s review possible integration patterns where Dell Technologies real-time streaming offerings facilitate data ingestion and partial preprocessing layers for implementing these patterns.
Real-time streaming and big data processing patterns
Customers implement real-time streaming in different ways to meet their specific requirements. This implies that there may exist many ways of integrating a real-time streaming solution, with the remaining components in the customer’s infrastructure ecosystem. Figure 1 depicts a minimal big data integration pattern that customers may implement by mixing and matching a variety of existing streaming, storage, compute, and business analytics technologies.
Figure 1: A modern big data integration pattern for processing real-time ingested data
There are several options to implement the Stream Processors layer, including the following two offerings from Dell Technologies.
Dell EMC–Confluent Ready Architecture for Real-Time Data Streaming
The core component of this solution is Apache Kafka, which also delivers Kafka Stream Processing in the same package. Confluent provides and supports the Apache Kafka distribution along with Confluent Enterprise-Ready Platform with advanced capabilities to improve Kafka. Additional community and commercial platform features enable:
- Accelerated application development and connectivity
- Event transformations through stream processing
- Simplified enterprise operations at scale and adherence to stringent architectural requirements
Dell Technologies provides infrastructure for implementing stream processing deployment architectures using one of two Kafka distributions from Confluent—Standard Cluster Architecture or Large Cluster Architecture. Both cluster architectures may be implemented as either the streaming branch of a Lambda Architecture or as the single process flow engine in a Kappa Architecture. For a description of the difference between the two architectures, see this blog. For more details about the product, see Dell Real-Time Big Data Streaming Ready Architecture documentation.
- Standard Cluster Architecture: This architecture consists of two Dell EMC PowerEdge R640 servers to provide resources for Confluent’s Control Center, three R640 servers to host Kafka Brokers, and two R640 servers to provide compute and storage resources for Confluent’s higher-level KSQL APIs leveraging the Apache Kafka Stream Processing engine. The Kafka Broker nodes also host the Kafka Zookeeper and the Kafka Rebalancer applications. Figure 2 depicts the Standard Cluster Architecture.
Figure 2: Standard Dell Real-Time Streaming Big Data Cluster Architecture
- Large Cluster Architecture: This architecture consists of two PowerEdge R640 servers to provide resources for Confluent’s Control Center, a configurable number of R640 servers for scalability to host Kafka Brokers, and a configurable number of R640 servers to provide compute and storage resources for Confluent’s KSQL APIs to the implementation of the Apache Kafka Stream Processing engine. The Kafka Broker nodes also host the Kafka Zookeeper and the Kafka Rebalancer applications. Figure 3 depicts the Standard Cluster Architecture.
Figure 3: Large Scalable Dell Real-Time Streaming Big Data Cluster Architecture
Dell EMC Streaming Data Platform (SDP)
SDP is an elastically scalable platform for ingesting, storing, and analyzing continuously streaming data in real time. The platform can concurrently process both real-time and collected historical data in the same application. The core components of SDP are open source Pravega for ingestion, Long Term Storage, Apache Flink for compute, open source Kubernetes, and a Dell Technologies proprietary software known as Management Platform. Figure 4 shows the SDP architecture and its software stack components.
Figure 4: Streaming Data Platform Architecture Overview
- Open source Pravega provides the ingestion and storage artifacts by implementing streams built from heterogeneous datatypes and storing them as appended “segments.” The classes of Unstructured, Structured, and Semi-Structured data may range from a small number of bytes emitted by IoT devices, to clickstreams generated from the users while they surf websites, to business applications’ intermediate transaction results, to virtually any size complex events. Briefly, SDP offers two options for Pravega’s persistent Long Term Storage: Dell EMC Isilon and Dell EMC ECS S3. These storage options are mutually exclusive—that is, both cannot be used in the same SDP instance. Currently, upgrading from one to another is yet to be supported. For details on Pravega and its role in providing storage for SDP streams using Isilon or ECS S3, refer to this Pravega webinar.
- Apache Flink is SDP’s default event processing engine. It consumes ingested streamed data from Pravega’s storage layer and processes it in an instance of a previously implemented data pipeline application. The pipeline application invokes Flink DataStream APIs and processes continuous unbounded streams of data in real time. Alternatives to Flink analytics engines, such as Apache Spark, are also available. To unify multiple analytics engines’ APIs and to prevent writing multiple versions of the same data pipeline application, an attempt is underway to add Apache Beam APIs to SDP to allow the implementation of one Flink data pipeline application that can run on multiple underlying engines on demand.
Comparative analysis: Dell EMC real-time streaming solutions
Both Dell EMC real-time streaming solutions address the same problem and ultimately provide the same solution for it. However, in addition to using different technology implementations, each tends to be a better fit for certain streaming workloads. The best starting point for selecting one over the other is with an understanding of the exactions of the target use case and workload.
In most situations, users know what they want for a real-time ingestion solution—typically an open-source solution that is popular in the industry. Kafka is demanded by customers in most of these situations. Additional characteristics, such as the mechanisms for receiving and storing events and for processing, are secondary. Most of our customer conversations are about a reliable ingestion layer that can guarantee delivery of the customer’s business events to the consuming applications. Further detailed expectations are focused on no loss of events, simple yet long-term storage capacity, and, in most cases, a well-defined process integration method for implementing their initial preprocessing tasks such as filtering, cleansing, and any transformation-like Extract Transform Load (ETL). The purpose of preprocessing is to offload nonbusiness-logic-related work from the target analytics engine—i.e., Spark, Flink, Kafka Stream Processing—resulting in better overall end-to-end real-time performance.
Kafka and Pravega in a nutshell
Kafka is essentially a messaging vehicle to decouple the sender of the event from the application that processes it for gaining business insight. By default, Kafka uses the local disk for temporarily persisting the incoming data. However, the longer-term storage for the ingested data is implemented in what’s known as Kafka Broker Servers. When an event is received, it is broadcast to the interested applications known as subscribers. An application may subscribe to more than one event-type-group, also known as a topic. By default, Kafka stores and replicates events of a topic in partitions configured in Kafka Brokers. The replicas of an event may be distributed among several Brokers to prevent data loss and guarantee recovery in case of a failover. A Broker cluster may be constructed and configured on several Dell EMC PowerEdge R640 servers. To avoid Brokers’ storage and compute capacity limitations, the Brokers’ cluster may be extended through the addition of more Brokers to the cluster topology. This is a horizontally scalable characteristic of Kafka architecture. By design, the de facto analytics engine provided in an open source Kafka stack is known as Kafka Stream Processing. It is customary to use Kafka Stream Processing as a preprocessing engine and then route the results as real-time streaming artifacts to an actual business logic implementing analytics engine such as Flink or Spark Streaming. Confluent wraps the Kafka Stream Processing implementation in an abstract process layer known as KSQL APIs. It makes it extremely simple to run SQL like statements to process events in the core Kafka Stream Processing engine instead of complex third-generation languages such as Java or C++, or scripting languages such as Python.
Unlike Kafka’s messaging protocol and events persisting partitions, Pravega implements a storage protocol and starts to temporarily persist events as appended streams. As time goes by, and the events age, they become long-term data entities. Therefore, unlike Kafka, the Pravega architecture does not require separate long-term storage. Eventually, the historical data is available in the same storage. Pravega, in Dell’s current SDP architecture, routes previously appended streams to Flink, which provides a data pipeline to implement the actual business logic. When it comes to scalability, Pravega uses Isilon or ECS S3 as extended and/or archiving storage.
Although both SDP and Kafka act as a vehicle between the event sender and the event processor, they implement this transport differently. By design, Kafka implements the pub/sub pattern. It basically broadcasts the event to all interested applications at the same time. Pravega makes specific events available directly to a specific application by implementing a point-to-point pattern. Both Kafka and Pravega claim guaranteed delivery. However, the point-to-point approach supports a more rigid underlying transport.
Dell Technologies offers two real-time streaming solutions, and it is not a simple task to promote one over the other. Ideally, every customer problem requires an initial analysis on the data source, data format, data size, expected data ingestion frequency, guaranteed delivery requirements, integration requirements, transactional rollback requirements (if applicable), storage requirements, transformation requirements, and data structural complexity. Aggregated results from such analysis may help us suggest a specific solution.
Dell works with customers to collect as much detailed information as possible about the customer’s streaming use cases. Kafka Stream Processing has an impressive feature that offloads the transformation portion of the analytics of a pipeline engine such as Flink or Spark to its Kafka Stream Processing engine. This could be a great advantage. Meanwhile SDP requires extra scripting efforts outside of the Flink configuration space to provide the same logically equivalent capability. On the other hand, SDP simplifies storage complexities through Pravega native streams-per-segments architecture, while Kafka core storage logic pertains to a messaging layer that requires a dedicated file system. Customers that have IoT device data use cases are concerned with ingestion high frequency rate (number of events per second). Soon, we can use this parameter and provide some benchmarking results of a comparative analysis of ingestion frequency rate performed on our SDP and Confluent Real-Time Streaming solutions.
I owe an enormous debt of gratitude to my colleagues Mike Pittaro and Mike King of Dell Technologies. They shared their valuable time to discuss the nuances of the text, guided me to clarify concepts, and made specific recommendations to deliver cohesive content.
Author: Amir Bahmanyari, Advisory Engineer, Dell Technologies Data-Centric Workload & Solutions. Amir joined Dell Technologies Big Data Analytics team in late 2017. He works with Dell Technologies customers to build their Big Data solutions. Amir has a special interest in the field of Artificial Intelligence. He has been active in Artificial and Evolutionary Intelligence work since late 1980’s when he was a Ph.D. candidate student at Wayne State University, Detroit, MI. Amir implemented multiple AI/Computer Vision related solutions for Motion Detection & Analysis. His special interest in biological and evolutionary intelligence algorithms lead to innovate a neuron model that mimics the data processing behavior in protein structures of Cytoskeletal fibers. Prior to Dell, Amir worked for several startups in the Silicon Valley and as a Big Data Analytics Platform Architect at Walmart Stores, Inc.
Related Blog Posts
TPCx-HS Results on Dell EMC Hardware with AMD Milan Processors
Mon, 19 Apr 2021 17:56:26 -0000|
Read Time: 0 minutes
As part of the Dell Technologies AMD Milan launch in March 2021, Dell Technologies published eight results with the Transaction Processing Performance Council (TPC) (www.tpc.org). Six of those results are categorized as Big Data; five as TPCx-HS and one as TPCx-Big Bench. Milan is a code name for AMD EPYC third-generation processors which represent a broad line-up for cloud, enterprise, and high-performance computing workloads. TPC Big Data results are categorized by data sizes that are known as Scale Factor (SF). The results showcase Big Data performance at the low-end SF 1 TB, while at the high end TPCx-HS crosses the “size frontier” with a 100 TB result.
Big Data technologies like Hadoop and Spark have become an important part of the enterprise IT ecosystem. The TPC Express Benchmark™ HS (TPCx-HS) was developed to provide an objective measure of hardware, operating system, and commercial Apache Hadoop File System API compatible software distributions. Also, it provides the industry with verifiable performance, price-performance, and availability metrics. The benchmark models a continuous system availability of 24 hours a day, 7 days a week.
This blog dives deeply into the results, what they mean, and why they are important for Big Data enthusiasts and professionals in Engineering, Marketing, and Sales.
System under test (SUT)
Figure 1: SUT for the SF100 TB result
All the 5 TPCx-HS results used 1 x PowerEdge 7515 for the Master node and either 9 or 16 PowerEdge R6515 nodes for the Worker nodes. The only major hardware configuration difference apart from the number of Worker nodes was the storage size. The SF 100 TB result was run on a SUT with each Worker node having 8 x 3.2 TB NVMe. All other results used 5 x 3.2 TB NVMe.
SF 100 TB
SF 3 TB
SF 1 TB
1 x AMD EPYC 75F3 32-core, 2.95 GHz, 256 MB-L3
512GB (8x64GB RDIM 3200 MT/s Dual Rank)
Network (Cluster Connectivity)
1x Mellanox Dual Port ConnectX-5 100GbE QSFP28 NIC
Network (Remote Connectivity)
1 x Broadcom Gigabit Ethernet BCM5720 NIC
Data Storage (Number of 3.2 TB NVMe)
SUSE Linux Enterprise Server 12 SP5
Cloudera Private Cloud Base 7.1.4
Open JDK 64-Bit Server build 1.8.0_232-cloudera
Table 1: SUT configuration
TPCx-HS was the first Big Data industry-standard benchmark that was designed to stress both hardware and software that is based on Apache HDFS API compatible distributions. TPCx-HS extends the workload that is defined in TeraSuite (TeraGen, TeraSort, TeraValidate) with formal rules for implementation, execution, metric, result verification, publication, and pricing. It can be used to assess a broad range of Big Data Hadoop system topologies, implementation methodologies and systems in a technically rigorous, directly comparable, and vendor-neutral manner. The current TPCx-HS specification can be found on the TPC Documentation Webpage.
The TPC requires that Express benchmarks like TPCx-HS must run the TPC-provided kit in order to publish a compliant TPC Express result. The latest TPCx-HS kit can be downloaded from www.tpc.org/tpcx-hs. The benchmark workload consists of the following modules:
- HSGen generates data at a particular Scale Factor. It is based on TeraGen.
- HSDataCheck checks the validity of the dataset and replication.
- HSSort sorts and orders the data. It is based on TeraSort.
- HSValidate validates the sorted output. It is based on TeraValidate.
The benchmark test is performed in five phases that are run from a TPCx-HS-master script. The phases must run sequentially without any overlaps.
Figure 2: TPCx-HS execution phases
The benchmark test consists of two runs, Run 1 and Run 2, which must follow the run phases shown above. Except for file system cleanup, no activities are allowed between Run 1 and Run 2. The total elapsed runtime T, in seconds is used for the TPCx-HS Performance Metric calculation. The performance run is defined as the run (Run 1 or Run 2) with the lower TPCx-HS Performance Metric. The repeatability run is defined as the run (Run 1 or Run 2) with the higher TPCx-HS Performance Metric. The reported Performance Metric is the TPCx-HS Performance Metric for the performance run.
Scale Factor (SF) is the dataset size in relation to the minimum required size of a test dataset. For TPCx-HS, the test dataset size must be selected from a set of fixed SFs:
The SF 100 TB result was the first TPCx-HS result to be published at that SF, which was a major milestone for the Industry.
What is measured?
All TPC published results disclose a Primary Metric that consists of a Performance metric, Price/Performance metric, and an availability date. For TPCx-HS:
- Performance Metric (HSph@SF) reflects the throughput of a run (Run 1 or Run 2) at Scale Factor SF. This metric is the elapsed time T for a performance run to perform all the five phases shown in Figure 2.
- Price/Performance Metric ($/HSph@SF) indicates the Total Cost of Ownership P needed to own and sustain the SUT that scored the Performance Metric.
- System availability date is the day all components used in the Performance test will be available to customers as defined in the TPC Pricing specification.
What the metric numbers mean
Generally, the faster the performance run is completed, the higher the performance score. The score is obtained by normalizing the run times as per the formulas shown above. For Price/Performance, the lower the metric score the better. In this case, a higher Performance score achieved on a SUT with a lower Total Cost of Ownership P will show a better price/performance metric.
As part of the Dell Technologies AMD Milan launch, Dell Technologies published five TPCx-HS results on March 04, 2021. These results summarized in Table 2 below show several performance scenarios:
- Data sizes that are scaled from SF 1 TB to SF 100 TB
- Two different frameworks MapReduce and Spark
- Two different cluster sizes 10 nodes and 17 nodes
The common denominator for these results is that all SUTs used the AMD Milan EPYC 75F3 processors.
SF 100 TB
SF 3 TB
SF 1 TB
Number of nodes
Elapsed Run time (s)
SF 100 TB
SF 3 TB
SF 1 TB
Performance Metric (HSph)
Total Cost of Ownership (TCO)
Total Rack Units (TRU)
Table 2: TPCx-HS results published on March 04, 2021
The table also shows that data sizes were scaled from 1 TB to 100 TB using SUTs occupying the same space as measured in Total Rack Units (TRU). Detailed results can be found at http://tpc.org/tpcx-hs/results/tpcxhs_perf_results5.asp?version=2.
Figure 3 below is a scatter chart that shows the relative performance, relative price/performance, and performance/TRU of all the TPCx-HS SF 1 TB results on March 04, 2021. These results are based on three AMD processor generations. The results in red markers are based on the first-generation AMD Naples processor. Orange markers show results that are based on the second-generation processor AMD Rome. Blue markers show results that are based on the most recent third-generation AMD Milan processors. Green markers show results that were from a competitor, CompA.
Figure 3: TPCx-HS SF 1 TB results by processor
The relative performance and price/performance scores use the results of the AMD Naples-based SUT as a reference. All performance results (circle marker) above the dashed line performed better than the reference SUT and those results below performed worse. Conversely, price/performance results (diamond marker) above the dashed line scored worse than those results below the line.
Figure 4 is a similar chart but shows results at SF 3 TB.
Figure 4: TPCx-HS SF 3 TB results by processor
Figure 5 is a bar chart that shows relative performance using the results of the R6415-Naples-17node-MR SUT as the reference. The bars show results that are based on AMD processors: red for AMD Naples; orange for AMD Rome; and blue for AMD Milan. The green bars are for results from competitor CompA.
Figure 5: TPCx-HS SF 1 TB relative performance results
Figure 6 is a chart that shows performance per TRU and color-coded similarly to Figure 5.
Figure 6: Performance/TRU
Key takeaways from the results
- AMD Milan-based SUTs give the best bang for the money based on TPCx-HS results. Figure 5 shows that they performed up to 2.72x better than SUTs based on earlier generation AMD processors and the competition. Figures 3 and 4 show that the price/performance is comparable to that of the reference SUT.
- Data sizes can be scaled without fear of reduction in price/performance. Table 2 shows that price/performance based on Total Cost of Ownership improves remarkably as the data sizes are scaled from SF 1 TB to SF 100 TB.
- It is worth investing in NVMe-based storage. As shown in Table 2, all the results used NVMe-based storage. This configuration enabled them to use the 1U R6515 servers which occupied less rack space without a reduction in computing resources.
- AMD Milan-based SUTs enable reduced data center footprint. Table 2 and Figure 6 show that within the same space, the workload size can be increased by a factor of 100 without loss of efficiency.
- AMD Milan-based SUTs show improved performance efficiency at scale. At SF 100 TB, more data is being processed in a proportionately less time.
With the publication of TPCx-HS results based on AMD Milan processors, Dell Technologies has become the most dominant publisher of TPC Big Data benchmarks. The results show that Dell EMC hardware platforms with third-generation AMD EPYC processors can do more work efficiently in less space, and with more value for the dollar.
Delivering Innovation with Object-Based Analytics
Thu, 25 Mar 2021 18:17:06 -0000|
Read Time: 0 minutes
As analytic workloads continue to grow, more pressure is placed on data teams to build an effective data strategy.
One large financial customer of ours is challenging their data teams to help fight false-positive card declines, which according to Javelin Research costs issuers and merchants over $118B. However, ensuring fraud prevention while cutting down on the false positives can be a fine line to walk. The key to solving this problem with an analytics model is to start with the data.
At Dell Technologies, we have helped our customers work through these challenges for many years, from building fraud detection to enabling life-saving healthcare. We understand that getting the data strategy right can help teams build models to solve their real-world problems. Dell Technologies has engaged in joint engineering and validation efforts to bring our leading distributed object storage product Dell EMC ECS to integrate industry leaders in the Analytics space.
Today we are happy to announce a collaboration with unified analytics warehousing leader Vertica, which will allow our customers to deliver cloud innovation on-premises, greater operational flexibility and efficiency, and scale infrastructure resources independently. Working with Vertica will allow our joint analytics customers to deliver flexible and efficient architecture by separating computer and storage.
Collaborating on Object-Based Analytics with Vertica
Vertica is a unified analytics warehouse built to deliver blazing-fast query performance regardless of scale or concurrency requirements. It is a highly scalable analytical database that works well in many deployment situations – on-premises, on top of a Hadoop or S3 data lake, and in any public clouds. Vertica features powerful SQL-based analytics, time-series, geospatial, and in-database machine learning capabilities. Vertica removes typical barriers to analytics for some of the world’s most prominent data-centric organizations.
Vertica in Eon mode decouples compute from storage to give customers the benefits of cloud architecture for analytic workloads. Previously only available in the public clouds, Vertica now offers Eon Mode with superior on-premises object storage solutions.
“Our customers trust us to provide the greatest freedom in how they consume the highest performance analytics – flexibility for the broadest deployment options, whether it’s deploying Vertica on any major public cloud or on-premises with more leading object storage options,” said Colin Mahony, senior vice president and general manager of Vertica.
Help Data Teams Solve Real World Problem
Data Teams can begin taking advantage of our joint collaboration today. Today’s announcement allows customers to:
- Vertica in EON Mode for Dell EMC ECS delivers cloud innovation on-premises.
- Separate compute and storage with ECS + Vertica EON Mode for delivering operations flexibility and efficiency.
- Scale infrastructure resources independently. Storage can grow without adding expensive compute, and compute can be scaled up or down with variable or intermittent workloads.
Vertica in Eon Mode for Dell EMC ECS gives companies a consistent platform for analytics across all of their environments, whether their data resides in the cloud or on-premises, or in a hybrid architecture. Check this white paper to learn about the technologies and environment used to confirm compatibility between Vertica in Eon Mode and Dell EMC ECS platform.