Elastic 7.12 Frozen Data and Dell Technologies ECS Enterprise Object Storage
Tue, 22 Jun 2021 12:28:53 -0000
|Read Time: 0 minutes
Many of us who work with Elastic are excited by the announcement of Elasticsearch 7.12 and the introduction of leveraging S3 for searchable frozen data in Elasticsearch Index Lifecycle Management (ILM). Dell Technologies’ customers were already able to take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots. Now with the introduction of frozen data to S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS.
Elasticsearch is part of the Elastic Stack, also known as the “ELK Stack”, a widely used collection of software products based on open source that is used for search, analysis, and visualization of data by Elastic.co. The Elastic Stack is useful for a wide range of applications, including observability, security, and general-purpose enterprise search. Dell Technologies is an Elastic Technology Partner, OEM Partner, and Elastic customer. Dell Technologies uses the Elastic Stack internally for several use cases, including observability of Kubernetes and text document search.
Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics. Dell ECS delivers rich S3-compatibility on a globally distributed architecture, empowering organizations to support enterprise workloads such as cloud-native, archive, IoT, AI, and big data analytics applications at scale. Dell ECS is being used by many customers as a globally distributed, object storage platform for machine data and analytics.
In July 2019 Dell Technologies published “Dell EMC ECS: Backing Up Elasticsearch Snapshot Data”. That document illustrates how to configure Elasticsearch to use the backup and restore API to store data in ECS. Snapshots in Elasticsearch are the only reliable and supported method to back up an Elasticsearch cluster. There are no Elastic-supported methods to restore data from a file system backup. You can take snapshots of an entire cluster, or only specific indices in the cluster. In addition to object storage on Dell ECS, Elasticsearch can be backed up to other shared file system such as Isilon or PowerScale. Backing up the data to Dell EMC storage allows customers to have peace of mind that their Elasticsearch data is protected. With Elasticsearch 7.12 and Cold and Frozen data, those snapshots take on even greater significance.
Elasticsearch 7.12 and Frozen Data
The frozen tier in Elasticsearch was introduced recently in Elastic 7.12. Index Lifecycle Management with hot, warm, and cold tiers in addition to the capability to search snapshots was already available in previous versions of Elasticsearch. The addition of the frozen tier allows object stores like Dell ECS to be fully searchable. The addition of the frozen tier in Elasticsearch decouples compute from long-term storage. This feature will help customers reduce costs and resources for historic data while maintaining or expanding the accessibility and value of historic data. Dell Technologies has numerous customers, especially in regulated industries such as healthcare or financial services, who keep or want to keep machine data anywhere from one to seven years to facilitate security investigations, trend analysis, predictive analytics, or audit and regulatory compliance. For many this can be cost-prohibitive, leading customers to choose to delete valuable data or store it in a format that is not easily accessible. Elastic has released the repository test kit to test and validate any S3-compatible object store to work with searchable snapshots and the frozen tier.
Dell Technologies Elasticsearch 7.12 Architecture with ILM and Frozen Data
So how might deployments of Elasticsearch with full data life cycle management look with the Dell Technologies portfolio? Elastic data life cycle management should leverage higher performance block storage for hot and warm data, hot on high speed and warm on lower cost and performance. This could be NVMe or lower density SSD on Dell PowerEdge, VxRail, PowerFlex or PowerStore for hot and higher density SSD or HDD for warm. In 2020, Dell Technologies validated the Elastic Stack running on our VxFlex family of HCI with both VMware and ECK. Because Elasticsearch tiers data on independent data nodes compared to multiple mount points on a single data node or indexer, the multiple types and classes of software defined storage that is presented to independent HCI clusters can be easily leveraged between Elasticsearch clusters to address data temperatures.
Once data is moved to the cold tier Elastic will single-instance your data if you have enabled replica shards. This allows the storing of up to twice the amount of data on the same amount of hardware over the warm tier by eliminating the need to store redundant copies locally. However, this also increases the value of snapshots as the indices in the cold tier are backed up to your object store for redundancy. As mentioned previously, snapshots would be to Dell Technologies ECS.
With the introduction of the frozen tier, Elasticsearch removes the need to store data on locally accessible block storage and uses searchable snapshots to directly search data stored in the object store without the need to rehydrate. As data migrates from warm or cold to frozen based on your ILM policy, indices on local nodes are migrated to your object store. A local cache, typically sized to 10 percent of your frozen data, stores recently queried data for optimal performance on repeat searches. This greatly reduces storage costs for large volumes of data.
Elasticsearch data nodes tend to have average allocations of 8 to 16 cores and 32 to 64 GB of RAM. With the current ability to support up to 112 cores and 6 TB of RAM in a single 2RU Dell server, Elasticsearch is an attractive application for virtualization or containerization. Per guidance from Elastic, if your typical warm tier node with 64 GB of RAM manages 10 TB of data, a cold tier node can handle about twice as much data, and a frozen tier node will jump up to approximately ten times as much data. We would recommend sizing for one physical CPU to one virtual CPU (vCPU) for Elasticsearch Hot Tier along with the management and control plane resources. While this is admittedly like the VMware guidance for some similar analytics platforms, these virtual machines tend to consume a significantly smaller CPU footprint per data node.
Figure 1: Logical Elastic Stack Architecture on HCI example
Conclusion
Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics. Dell Technologies’ customers can take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots, and now with the introduction of frozen data for S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS. Snapshots are the only reliable and supported method to back up an Elasticsearch cluster, and with the introduction of the cold and frozen tier, Elasticsearch snapshots become a critical component of Elasticsearch ILM. ILM with frozen data greatly reduces storage costs for large volumes of data, and Dell Technologies provides a portfolio capable of addressing the entire Elastic data lifecycle and compute requirements with multiple deployment options.
About the Authors
Keith Quebodeaux, Greg Galvan, Steve Meilinger, and Mark Thomas are Systems Engineers and Sales Specialists with Dell Technologies Data Centric Workloads and Solutions (DCWS), working with customers and prospective customers on their data analytics, artificial intelligence, and machine learning initiatives.
Reference Links:
- https://www.delltechnologies.com/resources/en-us/asset/white-papers/products/storage/h17847-dell-emc-ecs-backing-up-elasticsearch-data-wp.pdf
- https://www.delltechnologies.com/resources/en-us/asset/white-papers/products/storage/h18274-backup-elasticsearch-data-to-dell-emc-isilon.pdf
- https://www.elastic.co/about/partners/technology
- https://www.dellemc.com/en-in/collaterals/unauth/white-papers/products/converged-infrastructure/elastic-on-vxflex.pdf
- https://www.elastic.co/webinars/elasticsearch-sizing-and-capacity-planning