NVIDIA Triton Inference Server operates by serving models from one or more specified model repositories during server startup. Using this repository enables users to manage different model versions effectively, similar to version control systems, providing the life cycle for AI models in production. Each model version can be maintained as a distinct entity in the repository, facilitating retrieval of the wanted version when necessary. Moreover, some users employ external configuration management tools like Ansible to streamline the management of model versions. Such tools offer automation capabilities, making it easier to switch between different model versions seamlessly in Triton.