OVH Guides

Installing CUDA on a dedicated GPU server

Last updated 14th December 2017

Objective

You can install Compute Unified Device Architecture (CUDA) on a GPU server, but it requires a series of actions which we will explain to you in this guide.

Requirements

  • You must own a GPU server.
  • You must be logged into your server via SSH

Instructions

Once you have reinstalled the distribution/operating system, follow the instructions below.

Ubuntu

Update the kernel

wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/linux-headers-4.10.17-041017_4.10.17-041017.201705201051_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/linux-headers-4.10.17-041017-generic_4.10.17-041017.201705201051_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.10.17/linux-headers-4.10.17-041017-generic_4.10.17-041017.201705201051_amd64.deb
sudo dpkg -i ./linux-*

Modify the grub.cfg file

Edit the following file, and comment out the OVH kernel part:

/boot/grub/grub.cfg
##BEGIN /etc/grub.d/06_OVHkernel###
#menuentry "GNU/Linux with OVH Kernel, OVH kernel 4.9.58-xxxx-std-ipv6-64" {

#insmod part_gpt

#insmod part_gpt

#insmod diskfilter

# insmod mdraid09

#insmod ext2

#set root='mduuid/cabadcc1a60ca488a4d2adc226fd5302'

#if [ x$feature_platform_search_hint = xy ]; then

#search --no-floppy --fs-uuid --set=root --hint='mduuid/cabadcc1a60ca488a4d2adc226fd5302' b43e4008-da62-482e-a5c8-8384c40b69db

#else

#search --no-floppy --fs-uuid --set=root b43e4008-da62-482e-a5c8-8384c40b69db

#fi

#linux /boot/bzImage-4.9.58-xxxx-std-ipv6-64 root=/dev/md2 ro rootdelay=10 noquiet nosplash net.ifnames=0 biosdevname=0
###END /etc/grub.d/06_OVHkernel###

At this point, you will need to reboot the server. It should reboot on a new kernel.

To check this, you can type the following commands:

uname -a
Linux ns3065593 4.10.17-041017-generic #201705201051 SMP Sat May 20 14:53:33 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Then install a library and stop lightdm:

/etc/init.d/lightdm stop
apt-get install ncurses-base
apt-get install nvidia-384-dev
ln -sf /usr/lib/x86_64-linux-gnu/libGL.so.1 /usr/lib/libGL.so.1

Install CUDA

Now, you just need to install CUDA by following the procedure below:

cd /root
mkdir cuda
cd cuda
wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb
mv cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64-deb cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
dpkg -i cuda-repo-ubuntu1604-9-0-local_9.0.176-1_amd64.deb
apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub
apt-get update
apt-get install cuda
nano /etc/environment
#Once nano is open edit the PATH variable to include /usr/local/cuda-8.0/bin folder. After editing the file, the screen would look like this.
source /etc/environment
export PATH=/usr/local/cuda-9.0/bin${PATH:+:${PATH}}
nvcc --version

The nvidia-smi should now function properly:

nvidia-smi
Wed Nov 1 09:14:38 2017
+-----------------------------------------------------------------+
| NVIDIA-SMI 384.90 Driver Version: 384.90 |
+---------------------+---------------------+---------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|=================+=======================+=======================|
| 0 GeForce GTX 106... Off | 00000000:02:00.0 Off | N/A |
| 0% 27C P0 27W / 120W | 0MiB / 6072MiB | 0% Default |
+---------------------+---------------------+---------------------+
| 1 GeForce GTX 106... Off | 00000000:03:00.0 Off | N/A |
| 0% 29C P0 24W / 120W | 0MiB / 6072MiB | 2% Default |
+---------------------+---------------------+---------------------+
+-----------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=================+=======================+=======================|
| No running processes found |
+----------------------------------------------------------------+

CentOS 7

Update the OS

After the CUDA installation we must to perform an OS upgrade

# sudo yum -y update

Then we need to prepare the OS installing the following components:

# sudo yum groupinstall "Development Tools"
# sudo yum install kernel-devel epel-release
# sudo yum install dkms

Now we need to disable nouveau driver by changing the configuration /etc/default/grub file. Add the nouveau.modeset=0 and rd.driver.blacklist=nouveau into line starting with GRUB_CMDLINE_LINUX.

Below you can find example of grub configuration file reflecting the previously suggested change:

sudo nano /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="crashkernel=auto rhgb quiet rd.driver.blacklist=nouveau nouveau.modeset=0"
GRUB_DISABLE_RECOVERY="true"

Then we update the Grub information

# sudo grub2-mkconfig -o /boot/grub2/grub.cfg

Now we include the blacklist noveau concept in the following path

# sudo /etc/modprobe.d/blacklist.conf
blacklist nouveau

Finally we need to reboot the server

# sudo reboot

Install CUDA

Now we must to find the CUDA Toolkit RUN file for our specific OS version in the Nvidia official page -> Nvidia

# wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run

Then we can execute the package installation

# sudo sh cuda_*.run

Accepts the process

cuda step 1

And then select all the components to install

cuda step 2

When the installation will be finished we can export system path to Nvidia CUDA binary executables. Open the ~/.bashrc using your preferred text editor:

# sudo nano ~/.bashrc

And add the following two lines:

export PATH=/usr/local/cuda/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH

Then we do the Re-login the updated ~/.bashrc file

source ~/.bashrc

And finally we can check the driver version and d the service statuas with the following commands

# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168
# nvidia-smi
Fri May 17 15:37:53 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1070    Off  | 00000000:00:06.0 Off |                  N/A |
| 26%   28C    P0    32W / 151W |      0MiB /  8119MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Go further

Join our community of users on https://community.ovh.com/en/.


These guides might also interest you...