OVH Guides

How to generate environment file for Python jobs

Find out how to create your Python environment and export it as an environment file.

Last updated 04th May, 2020

Objective

This guide helps you create your python environment with Conda. Then we will see how to export it so you can use it to submit your Python job on OVHcloud Data Processing platform.

To read an introduction about the Data Processing service you can visit Data Processing Overview.

Requirements

  • Your application code as Python files.
  • Conda installed on your computer, refer to this guide.

Instructions

Step 1: Create your environment

OVHcloud Data processing is using Conda in order to manage packages and their dependencies. If you haven't installed Conda yet please do.

With Conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them. OVHcloud Data Processing uses this environment to make sure your Python job has everything necessary to run smoothly. If you want to learn more about Conda, have a look at their documentation.

Once installed, Conda will automatically create a first environment. You can then start installing the needed packages. To do so, use the install command:

$ conda install numpy

It will install the latest version of Numpy in the current environment, you just have to repeat this for each needed package. You can learn more about the install command and its options here.

Step 2: Export your environment

Now that you have an environment that suits your code, it's time to export it! To do so, make sure Conda is set in the environment you want to export and run this command:

$ conda env export --from-history -f environment.yml

It's going to generate a portable environment file. You will need this file to run your code on OVHcloud Data Processing. To learn more about environment file, have a look here.

You can now move on to the next step and submit a Python job.

Generic environment file

If you want to quickly test OVHcloud Data Processing with a basic job, you can use this environment file it includes commonly used packages:

name: datascience-environment
channels:
  - defaults
dependencies:
  - python=3.7.6
  - numpy
  - requests
  - pandas
  - boto3
  - pyspark
  - beautifulsoup4
  - sqlalchemy
  - pillow
  - scikit-learn

Do not hesitate to reuse this environment file. Also, feel free to add or remove packages to better fit your needs.

Go further

To learn more about using Data Processing and how to create cluster and process your data, we invite you to look at Data Processing documentations page.

You can send your questions, suggestions or feedbacks in our community of users on https://community.ovh.com/en/ or in our public Gitter


These guides might also interest you...