The customer has recently deployed Confluent Platform at the edge and in the data center core with stream replication from the edge to the core. The plan is to collect data from the furnaces that are potentially faulty by:
Once the data has been reliably captured in the data center, a Kafka to HDFS connector populates a data store for the data analytics team. The final data management step is to delete the furnace data from the data center Kafka topics.
The data science team and OT staff at the plant must agree on how the model will be used to scan new data for anomalies after the training process is complete. Many data science models never get deployed into production when the application developers and the data science team skip this planning step. The data science team and OT staff must agree on the data in the stream, and how to account for missing or zero-value data.
One of the reasons that the customer chose the Confluent Platform data streaming solution is the availability of stream processing with customized functions close to the data source. For scenarios where OT staff must be alerted quickly for any signs of rapidly declining component heath status, they prefer to scan and filter the data for warning signs at the edge.
Confluent Platform uses the Kafka KSQL functions to enable simplified stream processing in near real time. KSQL implements a familiar data access language that is familiar to developers with Structured Query Language (SQL) experience. The KSQL execution engine translates the query into a job that matches the streaming platform topology. It then performs the action without requiring the developer to know the platform internals. Confluent Platform also supports the development of user-defined functions (UDF). This use case uses a design pattern example, developed by Confluent, for implementation of a Deep Learning UDF for anomaly detection.(How to Build a UDF and/or UDAF in KSQL 5.0; March 13, 2020 ) (KSQL UDF Java Implementation (Deep Learning for Anomaly Detection) )