
Journey into the analytics space with Dell & Starburst
Thu, 02 Feb 2023 14:50:00 -0000
|Read Time: 0 minutes
Data silos are a growing concern for enterprises today. They pose new challenges to discover, access, and activate data. At Dell Technologies, we have helped our customers work through these challenges for many years, from building fraud detection to enabling life-saving healthcare. We understand that getting the data strategy right can help teams solve their real-world problems. Dell Technologies has engaged in joint engineering and validation efforts to integrate our leading server product Dell PowerEdge and our leading Dell ECS with industry leaders in the Data Analytics space.
Today, we are happy to announce a collaboration with analytics leader Starburst Data, which will allow our analytics customers to deliver flexible and efficient architectures by combining the fastest and most secure query engine and leading hardware platforms for compute and storage.
Data virtualization and federated query analytics
Starburst is built on top of Trino, the open-source high performance distributed SQL engine, that’s known for running fast analytic queries against data sources ranging in size from GBs to PBs. Trino was formerly called PrestoSQL. In fact, in 2020, we released a white paper describing how Presto’s capabilities translate remarkably well on to the Dell ECS object storage, and that Trino’s rich feature set positions it well to win the price/performance battle against Hadoop and other technologies in most cases!
The Starburst Enterprise Platform distribution of Trino was created to help enterprises extract more value from their Trino deployments through global security with fine-grained access controls, stable and reliable releases, additional connectors, data caching, and enterprise support including guidance from the most qualified group of Trino experts anywhere.
For these reasons, we chose to partner with Starburst and deploy their software in our labs to evaluate its performance on Dell hardware. We used the industry standard TPC-DS test suite to benchmark Starburst performance by measuring the total execution item as well as the per-query execution time. We also varied the hardware resources to model how Starburst’s performance varies. We detailed our set up and experiments for reproducibility in this paper. Our goal was to provide our customers with a validated design reference for deploying Starburst and scale it appropriately as the query volume, concurrency, or data volume scales.
Deploy and scale on Dell infrastructure
Starburst is based on a distributed Coordinator-Worker architecture. In our setup, we run coordinator and worker nodes of Starburst Enterprise on Dell PowerEdge servers and use unstructured storage such as Dell Elastic Cloud Storage (ECS) for materialized views, data products, caching, and more.
We tested the reference architecture on PowerEdge R740XD (14G), but we think the latest PowerEdge server portfolio (15G) can take performance to a new level with generational improvements such as:
- High-performance computing - delivering up to 43% greater performance by leveraging Intel's 3rd Gen Xeon Scalable processors.
- PCIe Gen 4 - doubling the throughput over prior server generations, with eight lanes of data.
- Comprehensive security - with data encryption, the root of trust protection, and supply chain verification.
- Improved energy efficiency - with the latest cooling technology, offering up to a 60% reduction in power consumption.
- Flexible, autonomous management - delivering up to 85% time savings by freeing up the skilled hands of IT professionals for other vital projects.
We used the ECS EX500 as a data lake source. ECS is the world’s most cyber-secure object storage that delivers scalable public cloud services with the reliability and control of a private-cloud infrastructure. With comprehensive protocol support for unstructured data (object and file) and a variety of deployment options (turnkey appliance or software-defined), ECS can support a wide range of workloads especially big data analytics. And best of all, Starburst works seamlessly with ECS!
Harness data to solve real world problems
Data teams can start taking advantage of our collaboration now. Today’s announcement allows customers to:
- Quickly deploy a thoroughly tested architecture comprising Dell hardware, Starburst Enterprise Platform, and other software on-premises
- Effectively partner with IT to move data intelligently into a data lake / data warehouse based on usage patterns
- Prevent vendor lock-in with support for the most popular open table and file formats
- Separate compute and storage and scale flexibly and efficiently
- Harness the innovations in our latest generation of ECS appliances as a data lake storage
We’re very excited about the collaboration and can’t wait for you check out the reference architecture to learn about the announcement and the solution.