Splunk Enterprise is a software platform that enables you to collect, index, and visualize machine-generated data that is gathered from different sources in your IT infrastructure. These sources include applications, networking devices, host and server logs, mobile devices, and more.
Splunk turns silos of data into operational insights and provides end-to-end visibility across your IT infrastructure to enable faster problem solving and informed, data-driven decisions.
Figure 4 provides a graphic overview of Splunk system architecture. A Splunk Enterprise instance can perform the role of a search head, an indexer, or both for small deployments. When daily ingest rates or search loads exceed sizing recommendations for a combined instance environment, Splunk Enterprise scales horizontally by adding more indexers and search heads. For more information, see the Splunk Capacity Planning Manual.
Figure 4. Splunk architecture overview
When a Splunk Enterprise indexer receives data, the indexer parses the raw data into distinct events, based on the timestamp of the event. The indexer then writes them to the appropriate index. Splunk implements storage tier involving hot/warm and cold data buckets to optimize performance for newly indexed data. This option keeps older data for longer periods on higher capacity storage.
Newly indexed data lands in a hot bucket, where Splunk actively reads and writes it. The hot bucket is rolled to a warm bucket when:
Warm buckets reside on the same tier of storage as hot buckets. The only difference is that warm buckets are read-only. The storage that is identified for hot/warm data must be your fastest storage tier. The reason is that it has the biggest impact on your Splunk Enterprise deployment performance.
When the number of warm buckets or volume size is exceeded, data is rolled into a cold bucket, which can reside on another tier of storage. If the latency is less than 5 milliseconds (ideally) and not more than 200 milliseconds, cold data may reside on a Network File System (NFS) mount. NAS technologies offer an acceptable blend of performance and lower cost per TB, making them a good choice for longer-term retention of cold data.
Data can also be archived or frozen, but such data is no longer searchable by Splunk search heads. Manual user action is required to bring the data back into Splunk Enterprise buckets to be searchable. You might choose to use frozen buckets to meet compliance retention requirements. However, this paper shows how Isilon’s massive scalability and competitive cost of ownership can enable you to retain more searchable data in the cold bucket. Figure 5 provides more information about Splunk bucket concepts.
Figure 5. Splunk index buckets
The Reference Architecture for Splunk Enterprise on Dell EMC Infrastructure is designed based on extensive customer experience with real-world Splunk production installations. It includes all the hardware, software, resources, and services that are required to deploy and manage Splunk Enterprise in a production environment. This reference architecture describes Splunk Enterprise on Dell EMC Infrastructure for three configurations covering a range of customer needs.
Splunk can be deployed in single instance mode, distributed mode, or indexer cluster mode in this reference architecture. See Table 1 for a summary of the configurations and minimum server counts.
Table 1. Configuration sizing
Configuration |
Reference |
Mid-Range |
High-Performance |
Daily ingest rate |
200 GB |
250 GB |
300 GB |
Retention (days) Hot/Warm |
30 |
30 |
30 |
Retention (days) Cold |
120 or 365 |
120 or 365 |
120 or 365 |
Search head |
1 standard |
1 standard |
1 standard |
Admin server |
1 standard |
1 standard |
1 standard |
Indexers |
1 200 GB |
1 250 GB |
1 300 GB |
Isilon cold storage |
Optional |
Optional |
Optional |
Splunk indexing is similar to compression algorithms. Common strings are found in the data, and pointers to the information are generated. The resulting data is stored on disk, resulting in:
The compression ratio that results (compress sized or original size) is usually less than one (1) and varies depending upon the incoming data. Random binary data does not result in any compression savings. Highly repetitive data such as log files can have a significant compression ratio.
This variance in compression cost and performance significantly impacts the solution performance and storage requirements, including the ingest rate. They also vary if you use premium applications such as ITSI, Enterprise Security, User Behavior Analytics, and so on. Premium applications provide additional functionality in a single package such as security monitoring and user behavior analytics.