My customer accountContact commercialWebmailOVHcloud Blog

Welcome to OVHcloud.

Log in to order, manage your products and services, and track your orders

Log in

Data

Learn the concept behind AI Training data

Last updated 18th May, 2021.

Definition

  • Data relates to any type of files (binary, text, etc.) that you want to use inside AI Training jobs.
  • Object Storage is a scalable, resilient and secure storage place accessible from anywhere through HTTPS APIs. It is a perfect place to store static files on the long term.
  • Volumes are filesystems storage units mounted and used inside AI Training jobs.

Best practices

OVHcloud Object Storage should be used to persist any data needed by AI Training jobs.

There are two ways to manage your data:

How it works

AI Training jobs can read and write data from and to the OVHcloud Object Storage. Here is what is happening under the hood :

  1. Before the job start, data is synchronized from the object storage into an underlying filesystem volume. This synchronization is done during the INITIALIZING phase.
  2. At the job start, the volume is attached inside the wanted directory. Data is then available inside the job as long as the RUNNING phase lasts.
  3. After the job stop, data is synchronized back from the underlying filesystem volume into the OVHcloud Object Storage. This synchronization is done during the FINALIZING phase.

image

Capabilities

Access rights

Users can give 2 different access rights on Filesystem volumes attached on jobs :

  • Read Only (shorten by ro) : The job will only be able to read data (writing is forbidden)
  • Read Write (shorten by rw) : The job will have full access

Volumes that are in a read only mode are not synchronized with the object storage during the FINALIZING phase because there is no point synchronizing data that have not changed during a job life.

Volume caching and sharing

By default filesystem volumes are created and deleted on the fly for each job needing it.

User can enable a cache feature that allow jobs to re-use available volumes instead of creating a new one each time. That feature serves several purposes :

  • Reducing synchronization time for data that have already been downloaded before.
  • Sharing same volumes between several jobs.

Unused volume data are regularly deleted. User should not rely on this cache feature for long time persistence.

Going further

Feedback

Please send us your questions, feedback and suggestions to improve the service:


Did you find this guide useful?

Please feel free to give any suggestions in order to improve this documentation.

Whether your feedback is about images, content, or structure, please share it, so that we can improve it together.

Your support requests will not be processed via this form. To do this, please use the "Create a ticket" form.

Thank you. Your feedback has been received.


These guides might also interest you...

OVHcloud Community

Access your community space. Ask questions, search for information, post content, and interact with other OVHcloud Community members.

Discuss with the OVHcloud community

In accordance with the 2006/112/CE Directive, modified on 01/01/2015, prices incl. VAT may vary according to the customer's country of residence
(by default, the prices displayed are inclusive of the UK VAT in force).