As noted in Introduction, a modern industrial facility can generate many TBs to PBs of data per day. The largest contributors to this massive data generation potential come from machine-generated log files and numeric telemetry. Log files typically parse nicely along event boundaries that include a timestamp, source, and message along with other metadata. These events may be forwarded to an upstream logging function one at a time or be written to a local file and processed in batches. For systems that only sporadically generate events, the logging system design must be reliable but can use a relatively simple scale up approach. When the devices produce prolific amounts of log message and there are many devices, a more complex scale-out computing and storage design is required. These systems are commonly called event streaming platforms.
Event streaming platforms can also be used for continuous time series of numeric telemetry. This scenario typically involves some preprocessing before sending the data upstream. Most telemetry data are produced as byte streams without any clear event boundaries. Every stream will typically have a frequency of data recording. Data streams where the interval between data points in one second or more is often called 1 sec, 5 sec, 30 sec, and so on. Many IIoT devices produced data recordings with thousands of points per second. For this class of data, we use frequency designation data points per second in kHz. For example, 1,000 recorded points per second would be called a 1kHz data stream.
Many types of IIoT require high frequency data for analysis since the significant changes in levels happen so quickly that one second or longer recording intervals would hide the details necessary to assess the health of the equipment or process. When processing high-frequency streams with an event streaming platform, the data must be preprocessed into time windows that look like events to the receiving function. Developers must understand the performance and reliability characteristics of the event streaming platform as well as the network link between the data generation process and the destination receiving function in order to make good choices about window size parameters. High-velocity data may be temporarily stored in a local queue and then sent for processing in mini-batches to cut down on communication overhead. However, this process would most likely be considered a data streaming analysis so long as the latency between generation and processing is low.
Some event streaming platforms have integrated analytics technology that can reduce or eliminate the need for integration with other technology. Integrating analytics with the event processing platform is one of the biggest changes that has led to the differentiation of streaming platforms from traditional messaging systems. The most flexible event streaming platforms support both in-process data analytics and developer APIs capable of calling out to remote systems for other types of analytics needs.