Last updated 06th March, 2020
This guide will help you to understand the different parameters that you can set when submitting a new job to the Data Processing platform
In this guide, we are assuming that you're using the OVHcloud Manager to use the Data Processing platform.
To read an introduction about Data Processing service you can visit Data Processing Overview.
- Access to the OVHcloud Manager
- An OVHcloud account
- A cloud project in your OVHcloud account (see How to create a cloud project for details.)
Your application must be uploaded into a container in your Object Storage. During the Beta, and for the Spark engine, your application can either be :
- a JAR file, if you're using Java or Scala
- Python files and a yaml requirement file (in Conda format), if your job is in Python
Step 1: Select your processing engine
To submit your job with your required parameters follow these steps:
- Login to the OVHcloud Manager and select
- Select the relevant project if you have multiple projects in your OVHcloud account.
Data Processingfrom the left panel.
Submit a new job
- Select the processing engine you want to use (Spark is the only option during the Beta).
- Go to the next step by clicking on the
Step 2: Select a region
In this step you will have to select the region in which you would like your Data Processing job to be run. The region list you will see will only contain the currently supported regions. Select a region and then click on
Step 3: Define your computation power and memory needs
An OVHcloud Data Processing job being executed in a distributed environment, you will have to specify the amount of resources you would like your job to use. The resources you will have to specify are going to depend on the engine you selected previously.
During the Beta, the only engine supported is Apache Spark, so the resources you will have to specify are going to be:
- Resources for each executor nodes
- Resources for the masters nodes
- Number of executors
If you want to know more about how to size your resources or how Apache Spark works, visit Apache Spark Documentation.
You can choose how to size your resources by either selecting some templates from the default view or clicking on Advanced configuration and setting everything by hand.
If you click on
Advanced configuration, you will have more options to configure your Driver and Executors and also you are not limited to some pre-defined templates. In advanced mode, you can change memory overhead for Drivers and Executors as well. Memory overhead is the amount of memory that each node of cluster requires for running Apache Spark processes itself.
When you configured compute and memory of your cluster, click on
Next button to go to the next step.
Step 4: Configure your job
Follow these steps to configure your job before submitting it to the Data Processing service:
- A random name is selected for your job by default. You can change the name and set a more meaningful name for you. Otherwise you can leave it as it is.
- Select the container that you created in your Object Storage where your application is uploaded. All contents of this container will be downloaded later by the Data Processing service. So, it is better to create one dedicated container for each job or keep the container clean and delete extra files.
For Apache Spark, for example, you will also have to :
- select your job type between Java/Scala or Python.
- select your main application file.
- If necessary, specify your application's required input arguments.
- Finally, click on the
Submit Jobbutton and your application will be sent to the Data Processing platform and should start shortly after that.
The arguments of the application are stored in plain text. It is advised that you store your credentials in configuration files instead of using arguments in the Manager. You need to upload the configuration files in the same Object Storage container that you upload the code so they will be downloaded to the data processing cluster together when you submit the job.
To learn more about using Data Processing and how to create cluster and process your data, we invite you to look at Data Processing documentations page.
These guides might also interest you...
How to submit a Python job on the Data Processing platform using the OVHcloud manager