IIoT Analytics Design – How important is MOM (message-oriented middleware)?
Wed, 29 Apr 2020 22:20:16 -0000|
Read Time: 0 minutes
Anyone that has been experimenting with Industrial Internet of Things (IIoT) analytics outside of the core plant systems has already realized that the diverse mix of data sources they are wrestling with is creating a lot of execution complexity. One of the leading causes of failed and over cost IIoT analytics POCs is underestimating the effort needed to assemble and clean an appropriate data set before any data modeling can begin. The current plant systems landscape in most industrial facilities is too complex to support cross-functional data assembly, but at the same time, critical to maintaining productivity. A rip and replace for core systems like the Manufacturing Execution System (MES), Manufacturing Control Systems (MCS), Supervisory Control and Data Acquisition (SCADA) System, or other key software systems in order to make IoT analytics easier is unlikely. In order make immediate progress toward exploring the value of new data-driven models, data scientists working in industrial settings are going to need to get data out of several existing apps and into a centralized platform designed for data model development. Modeling can't happen inside each core system.
If you agree there is a need in your organization for more data extraction, augmentation, consolidation, and synthesis the big question is where should those results land - at the plant (edge model), in a corporate data center (core/cloud model), and/or in a multi-client contract data center (public cloud model)? I’ve been talking to customers that are mixing and matching all these models and seeing benefits based on their particular use case needs. The one universal design challenge they all share is how to connect and manage data movement between systems and locations without having every pipeline turn into a one-of-a-kind snowflake.
Getting the right subset of operational data routed to the best analytics platform at either the edge, core, or cloud can quickly get out of hand if you try to support every existing API and communication protocols as one-off efforts for each project. The only way to deal with this complexity is to architect a common communication architecture that every critical system can talk to – in both directions. This type of design pattern goes by many names including enterprise service bus, integration platform-as-a-service, and message-oriented middleware. I went with the latter in the title for this article because it has the most name recognition among both operational technology (OT) and information technology (IT) professionals and makes an attention-grabbing headline!
Choosing the right technology and implementation details for a MOM to broker data exchange should not have to be a big risk, bet-the-farm endeavor. You just need to minimize the number of POCs and possible direction changes, especially that impact in-plant personnel. One place to look for guidance is organizations that are relatively new (greenfield) in the larger IoT space that have similar scale and needs. A good prospect for examination is the automotive industry organizations working on autonomous vehicles (ATVs). There are numerous examples of the use of bidirectional Kafka architectures in the developing autonomous vehicle (ATV) market. According to the Kafka online documentation: “We designed Kafka to be able to act as a unified platform for handling all the real-time data feeds a large company might have.”
The Kafka platform has three key capabilities:
- Publish and subscribe to streams of records, similar to a message queue or enterprise messaging system.
- Store streams of records in a fault-tolerant durable way.
- Process streams of records as they occur.
For ATV applications, the vehicles are configured as both a Kafka Producer and a Consumer to facilitate two-way communications. The sensor data produced on the ATV is processed internally by an embedded system and is shipped upstream as Kafka messages to a historian system and/or analytics data lake. A single ATV can produce as much as 4TB of data per day all of which is potentially useful to data scientists working on improving the reliability and safety of the vehicles. If Kafka is able to handle data on that scale for hundreds or thousands of vehicles in a test environment, it should warrant a closer look for integrating the vast number of high-frequency data streams that are of interest to data scientists working on Industry 4.0 development initiatives.
Software developers working on IoT scale data applications must deal with many design challenges. It doesn’t matter whether you are dealing with a conglomeration of legacy IoT applications or designing a brand-new micro-service based distributed application framework, the need for messaging-oriented middleware (MOM) quickly becomes apparent. If you work with a lot of existing operational technology (OT) systems close to where the “things” are, then you are most likely familiar with MQTT software for messaging. If you work in a data center closer to where the big data applications are, you are probably already familiar with Kafka for messaging and streaming. The potential to bridge many in-place systems and protocols with a robust common platform for messaging and streaming applications is what it driving so much interest in Kafka.
Dell EMC recently teamed up with Confluent, the company founded by the team that built Apache Kafka ®, to develop a solution based on their event streaming platform that enables companies to easily access data as real-time streams. An Architecture Guide for Real-Time Data Streaming is available for download and online viewing.
The RA show how Confluent Enterprise improves Apache Kafka by expanding its integration capabilities, adding tools to optimize and manage Kafka clusters, and methods to ensure the streams are secure. Confluent Platform makes Kafka easier to build and easier to operate. Confluent Open Source is freely downloadable, while Confluent Enterprise is available through subscription.
The infrastructure design used in the RA follows the general recommendations in the Apache Kafka and Confluent Enterprise Reference Architecture and can be used to implement small or large clusters, with a clear path to scaling from small to large. The Dell EMC paper also addresses the performance, latency, and reliability requirements of mission-critical production deployments.
Thanks for reading,
Phil Hummel, @GotDisk