A cnvrg.io project is the space that holds everything related to a specific ML problem or domain. You work and collaborate on projects. A project can include files (code), workspaces, experiments, flows, applications, and dashboards, which serve artifacts, models, research papers, and collaborators whose access can be controlled based on role.
cnvrg.io datasets allow you to upload and version any kind of file automatically and easily. cnvrg.io datasets use an object store for the backend and enable you to host any kind of file with no limit to size or quantity. Also, cnvrg.io datasets enable you to version, label, and tag your data.
Datasets are managed separately at the organizational level and not per project. When you upload datasets to cnvrg.io, you can reuse them in every project, experiment, and notebook.
Building ML pipelines and continual learning are key to an effective ML workflow. Reproducible and modular code components are core components of any such workflow. The AI Library in cnvrg.io facilitates these goals. It is a specially built package manager for ML components designed specifically for ML. The AI Library helps data scientists and developers build ML components and reuse them across projects.
cnvrg.io comes preconfigured with many AI Library components that you can start using immediately. You can also easily create your own components to share your work with your team and build a repository of your code and algorithms.
cnvrg.io is an AI operating system, designed to help you manage and use all your compute resources effectively. This task is enabled through your organization’s Dashboard tab. The Dashboard tab provides a complete overview of how your resources are allocated and how they are used, including CPU, memory, and GPU resources.
A cnvrg.io workspace is an interactive environment for developing and running code.
You can run popular notebooks, interactive development environments, Python scripts, and more. The environment is preconfigured (meaning that all your dependencies are preinstalled). All the files and data in your workspace are preserved after the workspace restarts. Your workspace has automatic version control and scalable compute available so that you can use unlimited compute resources for your data science research.
An experiment can be any executable, written in any language such as Python (for example, python neuralnet.py), R, Java, Scala, and more. It can also be an existing Jupyter notebook. You can run an experiment on any resource. When running an experiment, cnvrg.io automatically takes a snapshot of your code, launches a worker, installs all Docker-based dependencies, and runs your experiment. cnvrg.io frees up resources for other jobs when the experiment is over.
Flows are production-ready ML pipelines that allow you to build complex directed acyclic graph (DAG) pipelines and run your ML components (tasks) with a drag-and-drop operation. Each task in a flow is an ML component that is fully customizable and can run on different compute resources with different Docker images. For example, you can have feature engineering running on a Spark cluster, followed by a training task running on a GPU instance on a Kubernetes Cluster. Each run of the DAG produces an experiment for fully tracked and reproducible ML.
Endpoints
Endpoints in cnvrg.io are production-ready inference solutions for various use case applications. They support real-time stream inference on Kafka topics, web service endpoints for interactive inference using pipelines and applications, batch for bulk inference, OpenVINO inference for Intel-optimized inference, and more (TensorFlow Serving, Triton, RabbitMQ). Endpoints are combined with logging of all inputs and outputs as well as continual learning for automated tracking of model drift. Users can deploy robust auto scaling endpoints in minutes.