Last updated 04th May, 2020
Objective
This guide helps you create your python environment with Conda. Then we will see how to export it so you can use it to submit your Python job on OVHcloud Data Processing platform.
To read an introduction about the Data Processing service you can visit Data Processing Overview.
Requirements
- Your application code as Python files.
- Conda installed on your computer, refer to this guide.
Instructions
Step 1: Create your environment
OVHcloud Data processing is using Conda in order to manage packages and their dependencies. If you haven't installed Conda yet please do.
With Conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them. OVHcloud Data Processing uses this environment to make sure your Python job has everything necessary to run smoothly. If you want to learn more about Conda, have a look at their documentation.
Once installed, Conda will automatically create a first environment. You can then start installing the needed packages. To do so, use the install command:
$ conda install numpy
It will install the latest version of Numpy in the current environment, you just have to repeat this for each needed package. You can learn more about the install command and its options here.
Step 2: Export your environment
Now that you have an environment that suits your code, it's time to export it! To do so, make sure Conda is set in the environment you want to export and run this command:
$ conda env export --from-history -f environment.yml
It's going to generate a portable environment file. You will need this file to run your code on OVHcloud Data Processing. To learn more about environment file, have a look here.
You can now move on to the next step and submit a Python job.
Generic environment file
If you want to quickly test OVHcloud Data Processing with a basic job, you can use this environment file it includes commonly used packages:
name: datascience-environment
channels:
- defaults
dependencies:
- python=3.7.6
- numpy
- requests
- pandas
- boto3
- pyspark
- beautifulsoup4
- sqlalchemy
- pillow
- scikit-learn
Do not hesitate to reuse this environment file. Also, feel free to add or remove packages to better fit your needs.
Go further
To learn more about using Data Processing and how to create cluster and process your data, we invite you to look at Data Processing documentations page.
You can send your questions, suggestions or feedbacks in our community of users on https://community.ovh.com/en/ or in our public Gitter