Last updated 29th October, 2020.
A job in AI Training is the workload unit submitted to the cluster. A job runs as a Docker container within OVHcloud infrastructure.
Each job is linked to a Public Cloud project and specifies an amount of resources to use to run the training task along with a Docker image either publicly available, in the AI Training shared registry scoped to your project or the private registry of your choosing that you attached. For the latter, see the OVHcloud documentation on how to attach a private registry.
- A job will run indefinitely until completion or manual interruption.
- Data can be attached to a job to serve either/both as input for your training workload or output (e.g. model weights).
- The minimum resource requirement for a job is 1 GPU. If you do not customise you GPU resource request, the default requested is 1. CPU and Memory resources are not customisable.
- Billing for jobs is minute-based and starts at job initialisation until completion. Each commenced minute is billed completely.
- You can read further on job limitations here.
Under the hood
Jobs in AI Training are Docker containers within OVHcloud infrastructure.
- You can check the OVHcloud documentation on how to create a data.
- You can check the OVHcloud documentation on how to submit a job
Please send us your questions, feedback and suggestions to improve the service:
- On the OVHcloud AI community forum
Did you find this guide useful?
Please feel free to give any suggestions in order to improve this documentation.
Whether your feedback is about images, content, or structure, please share it, so that we can improve it together.
Your support requests will not be processed via this form. To do this, please use the "Create a ticket" form.
Thank you. Your feedback has been received.