Users of a data analytics system are familiar with ETL patterns and concepts such as pipelines and schedulers. Implementing them reliably and efficiently is a challenge. Apache Airflow®, an open-source solution, covers those issues but is not well suited for Kubernetes. Additionally, its complexity may require resources that compete with the whole machine learning platform.
The Kubeflow project can deal with this problem without adding huge overhead to an existing Kubernetes cluster. It uses custom resources and operators to enrich whole ecosystems and does not “reinvent the wheel” in terms of operations to support machine learning. Additionally, it provides the ability to build and run pipelines using UI, Python SDK, or Argo Workflows.