OVH Guide

Disk replacement

This guide explains you the steps to follow in case of a disk replacement.

Requirements

If you see a disk failure or that our system sent you a notification email to notify you of a disk failure, you need to take steps to replace it as soon as possible.

Procedure

Step 1 : Backup

Before doing anything, it is really important to performs your backups. The sole purpose of a RAID (except RAID 0) is to protect data against hard disk failures. Once a disk is failing, all your data depends on the health of the remaining disk.

It is improbable that two drives fails at the same time, but it is not impossible. By all means, implement an adequate backup strategy.

If you do not confirm that you have made your backup before asking for a disk replacement, you must state that you are aware of the risks and that you accept full responsibility.

Step 2 :Find defective disk(s)

Whether you have found the failure by yourself or if our system notified you, it is good practice to check the health of all hard disks.

The reason is that if we have two failing disks in a RAID array, we will start by replacing the disk with the higher error count.

Software RAID

If you have a Software RAID, use this guide to find the installed disks on your server.

Once you have found the device path of you disks, you can tests them using smartctl like so:

 smartctl -a /dev/sdX 

Don't forget to replace /dev/sdX for the actual device path of your disk.

This will also allow you to retrieve the serial number of the disk (s) you want to replace to communicate with the technician.

Here is an example of the returned result.

 smartctl -a -d ata /dev/sda smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.14.32-xxxx-grs-ipv6-64] (local bu                                                                                                                                                             ild) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net   === START OF INFORMATION SECTION === Device Model:     TOSHIBA DT01ACA050 Serial Number:    5329T58NS LU WWN Device Id: 5 000039 ff6d28993 Firmware Version: MS1OA750 User Capacity:    500 107 862 016 bytes [500 GB] Sector Sizes:     512 bytes logical, 4096 bytes physical Device is:        Not in smartctl database [for details use: -P showall] ATA Version is:   8 ATA Standard is:  ATA-8-ACS revision 4 Local Time is:    Thu Nov 24 15:51:25 2016 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled 

For a NVMe Disk

In the case of an NVMe disk, it will be necessary to place the server in Recue-pro mode on which the nvme-cli tool is installed by default.

You will need to use the nvme list command to retrieve the serial numbers of your disks.

 root@rescue:~# nvme list Node             SN                   Model                                    Namespace Usage                      Format           FW Rev ---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- -------- /dev/nvme0n1     CVPF636600YC450RGN   INTEL SSDPE2MX450G7                      1         450.10  GB / 450.10  GB    512   B +  0 B   MDV10253 /dev/nvme1n1     CVPF6333002Y450RGN   INTEL SSDPE2MX450G7                      1         450.10  GB / 450.10  GB    512   B +  0 B   MDV10253 

Hardware RAID

For Hardware RAID, use this guide and use the procedure related to your RAID controler to find out the devices path of your disks.

Once you have found the device path of you disks, you can tests them using smartctl like so:

 smartctl -d megaraid,N -a /dev/sdX 

The drive's Device ID

The RAID's Device (/dev/sda = 1st RAID, /dev/sdb = 2nd RAID, etc.)

Don't forget to replace /dev/sdX for the actual device path of your disk.

In some cases, you may get the following message : /dev/sda [megaraid_disk_00][SAT]: Device open changed type frome 'megaraid' to 'sat'. You will then have to replace megaraid by sat+megaraid as following : smartctl -d sat+megaraid,N -a /dev/sdX

For a LSI Raid Card, you can test disks them using smartctl like so:

 smartctl -a /dev/sgY 

The disk's Device (/dev/sg0 = 1st disk, /dev/sg1 = 2nd disk, etc.)

Step 3 : Disk replacement

To request a disk replacement, simply open a Support Ticket in the OVH Manager.

To accelerate the process, please provide the following informations:

  1. A date and time at which we should perform the replacement (you must plan for a small down time, but replacements can be scheduled 24/24-7/7). 2. A confirmation that either you have made your backup or that you take full responsibility for any data loss 3. The serial number of the hard disk we must change (to find the hard disk's serial number, please follow this this guide) [1]

[1] - (1) If for some reasons it is not possible to retreive the Serial Number of the failing hard disk, please specify it in the ticket and provide the Serial Numbers of all other disks.

Step 4 : After the replacement

If you have Hardware RAID, the RAID will re-sync itself. Please note that the re-sync process can take some time and affect the read/write performances of your disks.

Don't hesitate to consult this guide to verify the RAID's state.

If you have Software RAID, then you will have to rebuild your RAID array. This guide explains how to do it.