Intelligent Data Pipelining for Splunk with Cribl LogStream
Wed, 18 Aug 2021 09:21:04 -0000
|Read Time: 0 minutes
Dell Technologies has been working with customers for more than five years to help reduce Splunk infrastructure total cost of ownership (TCO) and complexity. Dell Technologies, at the time EMC, was pioneering the strategy of separating compute from long-term storage, with NFS and Isilon for cold data to reduce cost and complexity in managing historic data. This concept of separating compute from storage has now been adopted within the Splunk application with the introduction of SmartStore.
Dell Technologies was an early supporter of SmartStore with ECS being one of the first S3 platforms announced in the 2018 Splunk SmartStore launch blog. More recently Dell Technologies has worked with Intel to illustrate the value of NVMe in indexers to increase indexer density and performance. Dell Technologies will continue to drive these infrastructure innovations as containerization of Splunk Enterprise Security and IT Service Intelligence becomes generally available.
So, Dell Technologies can help customers reduce the cost and complexity of Splunk infrastructure. But how do we improve efficiencies with Splunk and the data it consumes? One way is intelligent data pipelining so customers can ensure they get data from their various sources to Splunk in the most efficient way possible and still be free to use that data elsewhere while maintaining control and flexibility. There are several tools that can provide data pipelining, but one very interesting solution for machine-generated data is Cribl LogStream. Cribl was started by former Splunkers and LogStream provides data pipelining for logs, metrics, and traces for any log analytics platform, not just for Splunk.
How is data pipelining and Cribl LogStream potentially beneficial to my Splunk environment? We feel there are a number of benefits of data pipelining that should be considered by our customers:
- Reducing ingested log volumes - We’re not suggesting that you should be ingesting fewer sources (in most cases you can actually do more!). Rather you can aggregate the data, remove duplicate fields and null values or simply format the data more efficiently. This reduction will directly impact the cost and downstream system performance.
- Keep a full fidelity copy of the source data – Cribl has the ability to send the original source data to a low-cost destination such as the Dell PowerScale or ECS platforms. This leaves you in control of your source data should you need to reinvestigate the data at a later date or if you are preparing to send your data to an external service provider like Splunk Cloud or an alternative Analytics platform like Elasticsearch. This source data facilitates any re-platforming or repatriation that could potentially happen in the future.
- Multiple platform support – Should you be using several different Analytics platforms in your environment, such as Elasticsearch for observability, Splunk for security and Cloudera for fraud-detection, Cribl has the ability to route the necessary data to the appropriate tool or even all the tools!
- Data Masking and Hashing – In environments where there’s highly sensitive data that Splunk is ingesting, such as customer, patient, or account data, Cribl gives you the ability to hash, obfuscate, or even eliminate the sensitive data in flight before ingesting into Splunk or another long-term platform.
What are the deployment options for Cribl LogStream on Dell Technologies? Cribl can be deployed on any architecture that supports Linux or containers. This could be Dell PowerEdge, PowerFlex, VxRail, or a combination of Dell PowerEdge and shared storage, like Dell PowerStore. The compute requirements for LogStream are relatively minimal when compared to Splunk Indexers. Compute ratios are estimated by Cribl at one physical core and 2 GB of RAM being capable of pipelining 400 GB of data per day. The storage requirements are likely to be disproportionately higher if the intent is to keep full-fidelity copies of data intact. Like compute, the options for storage are open, however, at scale Dell Technologies ECS or PowerScale are likely the most effective choices. Depending on your retention requirement, PowerStore may be a great choice given its amazing deduplication capabilities.
If you are a Splunk customer and interested in exploring how Cribl could be a benefit to you, please feel free to reach out to your Dell Technologies account team. Dell Technologies is both a partner and a customer at scale on Splunk and multiple other data platforms. We’ve had the honor and privilege to speak and be recognized at Splunk Conf and, as I hope the blog suggests, we have a complete portfolio of compute and storage, in addition to our partner ecosystem, to help customers on their Splunk journey.
Related Blog Posts
Increase Operational Efficiency with the Dell EMC PowerFlex App for Splunk
Tue, 04 Jul 2023 09:59:24 -0000
|Read Time: 0 minutes
In modern IT, admins struggle to manage and analyze enormous amounts of machine-generated data in order to understand its patterns and make important decisions. The Splunk Platform enables apps that can analyze and derive insights from data generated by disparate infrastructure layers, such as compute, storage, and network. The platform helps admins manage, visualize, analyze, and understand the various patterns efficiently and effectively to make the right decisions.
The Dell DMC PowerFlex software-defined platform is often used as an infrastructure foundation supporting multiple heterogeneous and SLA-sensitive workloads due to its scale-out nature and its ability to host workloads on a variety of hypervisors, containers, and bare metal platforms.
To support Splunk workloads, Dell Technologies offers the Dell EMC PowerFlex App for Splunk, integrating the PowerFlex environment with Splunk Enterprise. As a source for a vast amount of telemetry, the PowerFlex App for Splunk is a great tool for visualizing, monitoring, and capturing various PowerFlex storage metrics. It empowers the IT organizations to harness the power of data to improve IT outcomes by simplifying the storage management and operations.
Benefits to organizations
This app provides various benefits to organizations:
- Greater operational and storage efficiency
- Deeper storage environment insights
- Future capacity predictions
- Monitoring multiple storage environments from a single window
- Enhancing decision making capabilities based on historical trends
Key capabilities of the PowerFlex App for Splunk
Real-time visibility
24 out-of-the-box intuitive dashboards to visualize PowerFlex metrics in real-time. These metrics are logically grouped and presented in different dashboards.
Historical data
Historical data plays an important role in decision making. Taking decisive action before any event becomes a reality requires understanding the pattern over time. .
Health of the system
The app captures real time alerts at different levels, and they are categorized by severity.
Storage projection
This is one of the coolest features: using the native Splunk environment capabilities to forecast the future storage requirements based on the current usage.
Some sample dashboards
Overview Dashboard: Provides a summary of clusters and associated high level metrics, with navigation capabilities.
Replication Overview Dashboard: Provides a summary of Replication clusters and associated high level metrics.
Storage Forecasting Dashboard: Provides the details of future storage requirements depending upon the current storage utilization.
Historical Data Dashboard: Provides the historical performance data for the specified time intervals.
Related resources
PowerFlex App for Splunk infographic
Video: Dell EMC PowerFlex App for Splunk
Where to find Dell EMC PowerFlex App for Splunk
For those who are new to Splunk, you can get this app from http://splunkbase.splunk.com. This app comes in two parts:
- The Dell EMC PowerFlex App is available here https://splunkbase.splunk.com/app/5528/ and provides all the beautiful and awesome visualizations.
- The Add-on is available here https://splunkbase.splunk.com/app/5529/. This helps to configure the Gateways and to define the Instance End Points to make REST API calls.
Thanks for reading!
Author: Nataraj Naikar
Elastic 7.12 Frozen Data and Dell Technologies ECS Enterprise Object Storage
Tue, 22 Jun 2021 12:28:53 -0000
|Read Time: 0 minutes
Many of us who work with Elastic are excited by the announcement of Elasticsearch 7.12 and the introduction of leveraging S3 for searchable frozen data in Elasticsearch Index Lifecycle Management (ILM). Dell Technologies’ customers were already able to take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots. Now with the introduction of frozen data to S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS.
Elasticsearch is part of the Elastic Stack, also known as the “ELK Stack”, a widely used collection of software products based on open source that is used for search, analysis, and visualization of data by Elastic.co. The Elastic Stack is useful for a wide range of applications, including observability, security, and general-purpose enterprise search. Dell Technologies is an Elastic Technology Partner, OEM Partner, and Elastic customer. Dell Technologies uses the Elastic Stack internally for several use cases, including observability of Kubernetes and text document search.
Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics. Dell ECS delivers rich S3-compatibility on a globally distributed architecture, empowering organizations to support enterprise workloads such as cloud-native, archive, IoT, AI, and big data analytics applications at scale. Dell ECS is being used by many customers as a globally distributed, object storage platform for machine data and analytics.
In July 2019 Dell Technologies published “Dell EMC ECS: Backing Up Elasticsearch Snapshot Data”. That document illustrates how to configure Elasticsearch to use the backup and restore API to store data in ECS. Snapshots in Elasticsearch are the only reliable and supported method to back up an Elasticsearch cluster. There are no Elastic-supported methods to restore data from a file system backup. You can take snapshots of an entire cluster, or only specific indices in the cluster. In addition to object storage on Dell ECS, Elasticsearch can be backed up to other shared file system such as Isilon or PowerScale. Backing up the data to Dell EMC storage allows customers to have peace of mind that their Elasticsearch data is protected. With Elasticsearch 7.12 and Cold and Frozen data, those snapshots take on even greater significance.
Elasticsearch 7.12 and Frozen Data
The frozen tier in Elasticsearch was introduced recently in Elastic 7.12. Index Lifecycle Management with hot, warm, and cold tiers in addition to the capability to search snapshots was already available in previous versions of Elasticsearch. The addition of the frozen tier allows object stores like Dell ECS to be fully searchable. The addition of the frozen tier in Elasticsearch decouples compute from long-term storage. This feature will help customers reduce costs and resources for historic data while maintaining or expanding the accessibility and value of historic data. Dell Technologies has numerous customers, especially in regulated industries such as healthcare or financial services, who keep or want to keep machine data anywhere from one to seven years to facilitate security investigations, trend analysis, predictive analytics, or audit and regulatory compliance. For many this can be cost-prohibitive, leading customers to choose to delete valuable data or store it in a format that is not easily accessible. Elastic has released the repository test kit to test and validate any S3-compatible object store to work with searchable snapshots and the frozen tier.
Dell Technologies Elasticsearch 7.12 Architecture with ILM and Frozen Data
So how might deployments of Elasticsearch with full data life cycle management look with the Dell Technologies portfolio? Elastic data life cycle management should leverage higher performance block storage for hot and warm data, hot on high speed and warm on lower cost and performance. This could be NVMe or lower density SSD on Dell PowerEdge, VxRail, PowerFlex or PowerStore for hot and higher density SSD or HDD for warm. In 2020, Dell Technologies validated the Elastic Stack running on our VxFlex family of HCI with both VMware and ECK. Because Elasticsearch tiers data on independent data nodes compared to multiple mount points on a single data node or indexer, the multiple types and classes of software defined storage that is presented to independent HCI clusters can be easily leveraged between Elasticsearch clusters to address data temperatures.
Once data is moved to the cold tier Elastic will single-instance your data if you have enabled replica shards. This allows the storing of up to twice the amount of data on the same amount of hardware over the warm tier by eliminating the need to store redundant copies locally. However, this also increases the value of snapshots as the indices in the cold tier are backed up to your object store for redundancy. As mentioned previously, snapshots would be to Dell Technologies ECS.
With the introduction of the frozen tier, Elasticsearch removes the need to store data on locally accessible block storage and uses searchable snapshots to directly search data stored in the object store without the need to rehydrate. As data migrates from warm or cold to frozen based on your ILM policy, indices on local nodes are migrated to your object store. A local cache, typically sized to 10 percent of your frozen data, stores recently queried data for optimal performance on repeat searches. This greatly reduces storage costs for large volumes of data.
Elasticsearch data nodes tend to have average allocations of 8 to 16 cores and 32 to 64 GB of RAM. With the current ability to support up to 112 cores and 6 TB of RAM in a single 2RU Dell server, Elasticsearch is an attractive application for virtualization or containerization. Per guidance from Elastic, if your typical warm tier node with 64 GB of RAM manages 10 TB of data, a cold tier node can handle about twice as much data, and a frozen tier node will jump up to approximately ten times as much data. We would recommend sizing for one physical CPU to one virtual CPU (vCPU) for Elasticsearch Hot Tier along with the management and control plane resources. While this is admittedly like the VMware guidance for some similar analytics platforms, these virtual machines tend to consume a significantly smaller CPU footprint per data node.
Figure 1: Logical Elastic Stack Architecture on HCI example
Conclusion
Dell Technologies ECS Enterprise Object Storage is the leading object storage platform from Dell EMC and boasts unmatched scalability, performance, resilience, and economics. Dell Technologies’ customers can take advantage of ECS Enterprise Object Storage for Elasticsearch snapshots, and now with the introduction of frozen data for S3 for Elasticsearch, customers can reduce the total cost of ownership of historic data in Elasticsearch while maintaining data value and accessibility on Dell ECS. Snapshots are the only reliable and supported method to back up an Elasticsearch cluster, and with the introduction of the cold and frozen tier, Elasticsearch snapshots become a critical component of Elasticsearch ILM. ILM with frozen data greatly reduces storage costs for large volumes of data, and Dell Technologies provides a portfolio capable of addressing the entire Elastic data lifecycle and compute requirements with multiple deployment options.
About the Authors
Keith Quebodeaux, Greg Galvan, Steve Meilinger, and Mark Thomas are Systems Engineers and Sales Specialists with Dell Technologies Data Centric Workloads and Solutions (DCWS), working with customers and prospective customers on their data analytics, artificial intelligence, and machine learning initiatives.
Reference Links:
- https://www.delltechnologies.com/resources/en-us/asset/white-papers/products/storage/h17847-dell-emc-ecs-backing-up-elasticsearch-data-wp.pdf
- https://www.delltechnologies.com/resources/en-us/asset/white-papers/products/storage/h18274-backup-elasticsearch-data-to-dell-emc-isilon.pdf
- https://www.elastic.co/about/partners/technology
- https://www.dellemc.com/en-in/collaterals/unauth/white-papers/products/converged-infrastructure/elastic-on-vxflex.pdf
- https://www.elastic.co/webinars/elasticsearch-sizing-and-capacity-planning