Data science is becoming a critical practice of many organizations. IT leaders face increased demand to support the research, development, and operations of this rapidly developing business function. The Domino Data Science Platform enables fast, reproducible, and collaborative work for data products like models, dashboards, and data pipelines. Users can run jobs, launch interactive notebook sessions, view vital metrics, share work with collaborators, and communicate with their colleagues in the Domino web application, as shown in the following figure:
.
Figure 1. Domino approach for addressing the Data Science Platform
Domino Data Science Platform is a containerized application that is built on a microservice architecture. Kubernetes is used for the control plane and orchestration engine. This architecture provides the flexibility to run Domino Data Science Platform with your existing containerized applications or in a dedicated cluster with access to hardware resources such as GPU accelerators. System administrators can enforce IT policies in the Domino environment through the interfaces that Kubernetes provides. Users can also choose enterprise-focused distributions such as Pivotal PKS Enterprise, Red Hat OpenShift, and the upstream open-source release. For more information about each of these distributions, see Choosing a Kubernetes distribution.
All workloads in the Domino application run as containerized processes, orchestrated by Kubernetes. The platform nodes are responsible for the client layer and service layer. The compute grid handles the execution layer.
The following figure shows the three layers of the Domino architecture:
Figure 2. Example of the configuration architecture
The client layer contains the front-end pods that are the targets of a network load balancer. Users can access Domino’s core features to develop and publish AI publications by connecting to the front-ends using:
The service layer contains the Domino API server, Dispatcher, Keycloak authentication service, and the metadata services that Domino uses to provide reproducibility and collaboration features. MongoDB stores application object metadata, Git manages code and file versioning, Elasticsearch powers an in-application search, and Domino Environments uses the Docker registry. Project data, logs, and backups are written to durable blob storage.
Domino Data Science Platform launches and manages ephemeral pods that run user workloads in the execution layer. These pods may host Jobs, Model APIs, Apps, Workspaces, and Docker image builds.