Learn the concept behind serving engine models


Models in a ML Serving project refer to HTTP API endpoints that serve machine learning models.

There are two kinds of models :

  • Preset models
  • Serialized models

Preset models

A preset model is a model that has already been built and added by OVHcloud administrators of the ML Serving platform and is available for deployment on the fly.

Serialized models

A serialized model is a model that can be loaded from a file with a supported format.

Currently supported formats are:

  • ONNX
  • TensorFlow SavedModel

Instructions about how to export models can be found here:


  • Users choose to deploy a model inside one of their namespaces.
  • Once deployed, each model is reachable from everywhere on the Internet from a generated url.
  • Access control over models management and querying can be configured by the namespaces owner by creating access tokens.

Under the hood

Each model deployed inside a ML Serving namespace is actually a docker container built and pushed into the linked docker registry and then started inside the kubernetes namespace.

Going further

