How to generate environment file for Python jobs
Find out how to create your Python environment and export it as an environment file.
Find out how to create your Python environment and export it as an environment file.
Last updated 04th May, 2020
This guide helps you create your python environment with Conda. Then we will see how to export it so you can use it to submit your Python job on OVHcloud Data Processing platform.
To read an introduction about the Data Processing service you can visit Data Processing Overview.
OVHcloud Data processing is using Conda in order to manage packages and their dependencies. If you haven't installed Conda yet please do.
With Conda, you can create, export, list, remove, and update environments that have different versions of Python and/or packages installed in them. OVHcloud Data Processing uses this environment to make sure your Python job has everything necessary to run smoothly. If you want to learn more about Conda, have a look at their documentation.
Once installed, Conda will automatically create a first environment. You can then start installing the needed packages. To do so, use the install command:
$ conda install numpy
It will install the latest version of Numpy in the current environment, you just have to repeat this for each needed package. You can learn more about the install command and its options here.
Now that you have an environment that suits your code, it's time to export it! To do so, make sure Conda is set in the environment you want to export and run this command:
$ conda env export --from-history -f environment.yml
It's going to generate a portable environment file. You will need this file to run your code on OVHcloud Data Processing. To learn more about environment file, have a look here.
You can now move on to the next step and submit a Python job.
If you want to quickly test OVHcloud Data Processing with a basic job, you can use this environment file it includes commonly used packages:
name: datascience-environment
channels:
- defaults
dependencies:
- python=3.7.6
- numpy
- requests
- pandas
- boto3
- pyspark
- beautifulsoup4
- sqlalchemy
- pillow
- scikit-learn
Do not hesitate to reuse this environment file. Also, feel free to add or remove packages to better fit your needs.
To learn more about using Data Processing and how to submit a job and process your data, we invite you to look at Data Processing documentations page.
You can send your questions, suggestions or feedbacks in our community of users on https://community.ovh.com/en/ or on our Discord in the channel #dataprocessing-spark
Please feel free to give any suggestions in order to improve this documentation.
Whether your feedback is about images, content, or structure, please share it, so that we can improve it together.
Your support requests will not be processed via this form. To do this, please use the "Create a ticket" form.
Thank you. Your feedback has been received.
Access your community space. Ask questions, search for information, post content, and interact with other OVHcloud Community members.
Discuss with the OVHcloud community