How to submit a Python job on the Data Processing platform using the OVHcloud manager

Find out how to create a cluster and run your Apache Spark Python job with Data Processing platform using the OVHcloud manager

Last updated 04th May, 2020


This guide helps you to upload an Apache Spark job in Python to your OVHcloud Object Storage and run your job with the Data Processing page in the Manager.

If you want to submit an Apache Spark job in Java or Scala language, you can read this document: How to submit a Java/Scala job using Data Processing Manager

In this guide, we are assuming that you're using the OVHcloud Manager to use the Data Processing platform.

To read an introduction about the Data Processing service you can visit Data Processing Overview.


  • Access to OVHcloud Manager
  • An OVHcloud account
  • A cloud project in your OVHcloud account (see How to create a cloud project for details.)
  • An Openstack user in your cloud project and access to Openstack Horizon dashboard (see How to create an Openstack user and access to Horizon for details.)
  • Your application code as Python files
  • An environment.yml file in Conda standard. (Anaconda is a Python distribution that helps you easily manage and share your Python environment and requirements. It contains a lot of Python packages by default and specially the data science related packages. Please visit this website to learn more about how to create your environment.yml file with Conda format.)


Step 1: Upload your Python application code and requirements file

Before running your job in the Data Processing platform, you will need to create a container in your Object Storage. Then you will need to upload your application Python files and environment.yml file at the root of this container. You can work with your Object Storage using either the OVHcloud manager or the Openstack Horizon dashboard.

Please see Creating Storage Containers in Customer Panel or Create an object container in Horizon for more details.

If you don’t currently have an application code and you still would like to try OVHcloud Data Processing, you can download and use the PI sample from Apache Spark repository.

If your application has some package requirements or needs a specific version of Python to run your job, make sure that you mention them in your Conda environment.yml file. You can learn how to generate environment file here.

Step 2: Submit your Apache Spark job

To submit your job with your required parameters follow these steps:

  • Login to the OVHcloud manager and select Public Cloud
  • Select the relevant project if you have multiple projects in your OVHcloud account
  • Select Data Processing from the left panel
  • Select Submit a new job

Data Processing Manager

  • Fill the "Submit a job" form that is now displayed and at the end push the Submit job button to submit your Apache Spark job.

Please see How to fill job submit form in Data Processing Manager for more details.

Step 3: Check information, status and logs of a job

In the Data Processing section of the OVHcloud Manager you can see the list of all the jobs that you have submitted so far. If you click on a job's name, you can see detailed information on it, including its status. Then you can click on the Logs to see the live logs while the job is running.

If your jobs are stuck in "Running", you probably forgot to stop the spark context in your code. To stop it, please refer to pyspark package documentation.

Once the job will be finished, the complete logs will be saved to your Object Storage container. You will be able to download it from your account whenever you would like.

Please see How to check your job's logs in the Data Processing manager page for more details.

Step 4: Check your job's results

After your Spark job is finished, you will be able to check the results from your logs as well as in any connected storage your job was designed to update.

Go further

To learn more about using Data Processing and how to submit a job and process your data, we invite you to look at Data Processing documentations page.

You can send your questions, suggestions or feedbacks in our community of users on or on our Discord in the channel #dataprocessing-spark

Cette documentation vous a-t-elle été utile ?

N’hésitez pas à nous proposer des suggestions d’amélioration afin de faire évoluer cette documentation.

Images, contenu, structure… N’hésitez pas à nous dire pourquoi afin de la faire évoluer ensemble !

Vos demandes d’assistance ne seront pas traitées par ce formulaire. Pour cela, utilisez le formulaire "Créer un ticket" .

Merci beaucoup pour votre aide ! Vos retours seront étudiés au plus vite par nos équipes..

Ces guides pourraient également vous intéresser...

OVHcloud Community

Accedez à votre espace communautaire. Posez des questions, recherchez des informations, publiez du contenu et interagissez avec d’autres membres d'OVHcloud Community.

Echanger sur OVHcloud Community

Conformément à la Directive 2006/112/CE modifiée, à partir du 01/01/2015, les prix TTC sont susceptibles de varier selon le pays de résidence du client
(par défaut les prix TTC affichés incluent la TVA française en vigueur).