Intelligent Data Pipelining for Splunk with Cribl LogStream
Wed, 18 Aug 2021 09:21:04 -0000
|Read Time: 0 minutes
Dell Technologies has been working with customers for more than five years to help reduce Splunk infrastructure total cost of ownership (TCO) and complexity. Dell Technologies, at the time EMC, was pioneering the strategy of separating compute from long-term storage, with NFS and Isilon for cold data to reduce cost and complexity in managing historic data. This concept of separating compute from storage has now been adopted within the Splunk application with the introduction of SmartStore.
Dell Technologies was an early supporter of SmartStore with ECS being one of the first S3 platforms announced in the 2018 Splunk SmartStore launch blog. More recently Dell Technologies has worked with Intel to illustrate the value of NVMe in indexers to increase indexer density and performance. Dell Technologies will continue to drive these infrastructure innovations as containerization of Splunk Enterprise Security and IT Service Intelligence becomes generally available.
So, Dell Technologies can help customers reduce the cost and complexity of Splunk infrastructure. But how do we improve efficiencies with Splunk and the data it consumes? One way is intelligent data pipelining so customers can ensure they get data from their various sources to Splunk in the most efficient way possible and still be free to use that data elsewhere while maintaining control and flexibility. There are several tools that can provide data pipelining, but one very interesting solution for machine-generated data is Cribl LogStream. Cribl was started by former Splunkers and LogStream provides data pipelining for logs, metrics, and traces for any log analytics platform, not just for Splunk.
How is data pipelining and Cribl LogStream potentially beneficial to my Splunk environment? We feel there are a number of benefits of data pipelining that should be considered by our customers:
- Reducing ingested log volumes - We’re not suggesting that you should be ingesting fewer sources (in most cases you can actually do more!). Rather you can aggregate the data, remove duplicate fields and null values or simply format the data more efficiently. This reduction will directly impact the cost and downstream system performance.
- Keep a full fidelity copy of the source data – Cribl has the ability to send the original source data to a low-cost destination such as the Dell PowerScale or ECS platforms. This leaves you in control of your source data should you need to reinvestigate the data at a later date or if you are preparing to send your data to an external service provider like Splunk Cloud or an alternative Analytics platform like Elasticsearch. This source data facilitates any re-platforming or repatriation that could potentially happen in the future.
- Multiple platform support – Should you be using several different Analytics platforms in your environment, such as Elasticsearch for observability, Splunk for security and Cloudera for fraud-detection, Cribl has the ability to route the necessary data to the appropriate tool or even all the tools!
- Data Masking and Hashing – In environments where there’s highly sensitive data that Splunk is ingesting, such as customer, patient, or account data, Cribl gives you the ability to hash, obfuscate, or even eliminate the sensitive data in flight before ingesting into Splunk or another long-term platform.
What are the deployment options for Cribl LogStream on Dell Technologies? Cribl can be deployed on any architecture that supports Linux or containers. This could be Dell PowerEdge, PowerFlex, VxRail, or a combination of Dell PowerEdge and shared storage, like Dell PowerStore. The compute requirements for LogStream are relatively minimal when compared to Splunk Indexers. Compute ratios are estimated by Cribl at one physical core and 2 GB of RAM being capable of pipelining 400 GB of data per day. The storage requirements are likely to be disproportionately higher if the intent is to keep full-fidelity copies of data intact. Like compute, the options for storage are open, however, at scale Dell Technologies ECS or PowerScale are likely the most effective choices. Depending on your retention requirement, PowerStore may be a great choice given its amazing deduplication capabilities.
If you are a Splunk customer and interested in exploring how Cribl could be a benefit to you, please feel free to reach out to your Dell Technologies account team. Dell Technologies is both a partner and a customer at scale on Splunk and multiple other data platforms. We’ve had the honor and privilege to speak and be recognized at Splunk Conf and, as I hope the blog suggests, we have a complete portfolio of compute and storage, in addition to our partner ecosystem, to help customers on their Splunk journey.