The ingesting, moving, processing, and storing of data incurs incremental costs that are sizeable for any project with the potential for significant return on investment. If data was streamed for all sources from every piece of equipment and sensor in a modern industrial facility, disk storage appliances could fill up faster than they could be ordered and installed. But to what end? It is unlikely that creating a fully populated data lake in the context of Industry 4.0 would produce a positive return on investment (ROI).
This understanding leads to being able to define perhaps the biggest challenge of Industry 4.0 - data curation. How can an organization filter the sea of data already available in order to identify the streams of information that will help produce a better ROI? The areas of possibility include reducing cost, reducing safety risk, or improving productivity to name a few. However, most organizations have not yet identified an efficient method to get data from the vast population of industrial Internet things to a destination where the right people and tools can be applied to extract useful insights.
Then, after data curation is solved, there are two additional steps in the value chain that must be put in place. What tools and systems are required to support data scientists and software application developers so they can create intelligent cyber-physical systems that are based on the curated data? And, lastly, how can the developers get their software and models deployed near the sources of the data required to feed inference and decision support. This reference architecture starts with the assumption that data is cheap and plentiful, and from there sheds insight on these significant questions.
There are no silver bullets in an environment as complex as Industry 4.0. Therefore, no single technology or solution exists that solves the challenges of building and deploying cyber-physical systems for every class of problem. The best solutions are built with technology that is enough to be used for multiple applications and use cases. This reference architecture focuses on the use of Kafka and the Confluent Platform.
Kafka is a proven technology platform that has found success in many large-scale enterprise and Internet scale applications. Although the early Kafka successes were mainly in the realm of enterprise IT and application integration, the use of Kafka in both Industry 4.0 and consumer IoT environments has been getting more attention in the last few years. Dell EMC believes that the Confluent Platform can have a significant role for building a complete end-to-end data analysis value chain for Industry 4.0 applications.