VMware HA (High Availability)

Manage restart policy with HA function

Last updated November 18th 2020

Objective

The primary function of VMware HA (High Availability) is to restart the virtual machines on another host in the cluster if a hardware failure occurs. HA also monitors VMs and applications.

This guide explains the setting of this function.

Instructions

Activation

HA is enabled by default in the first cluster that OVHcloud provides you when we deliver your Managed Bare Metal. If a new cluster is created, you can enable it when it is created, or after.

If HA is not enabled in your cluster, go to the Configure tab of your cluster and then to the vSphere Availability tab available in Services.

Click Edit and slide the cursor to enable the HA feature.

It is also important to enable host monitoring. This setting allows pulsing signals to be sent between ESXi hosts to detect a possible failure. Disabling it is required to perform update operations with Update Manager.

Settings

Host Failure Response

This first category allows you to set your VMs reboot policy based on the different possible failures.

Failure Response

This category will set your policy to restart VMs if a host is lost.

This allows you to choose whether to restart your virtual machines automatically or not. Default restart management on the cluster is also possible. You can refine this by virtual machine in the tab VM Overrides.

You can also select a condition other than the default (Allocated Resources), which vSphere HA will verify before restarting.

Response for Host Isolation

This category allows you to define actions to take if network connectivity is lost on a host.

You can choose :

  • No action will be taken on the affected VMs.
  • All affected VMs will be powered off and vSphere HA will attempt to restart the VMs on hosts that still have network connectivity.
  • All affected hosts will be gracefully shut down and vSphere HA will attempt to restart the VMs on hosts that are still online.

Datastore with PDL

If a datastore fails with a permanent device loss (PDL) state, you can set the actions to take:

  • No action will be taken to the affected VMs.
  • No action will be taken to the affected VMs, events will be generated.
  • All affected VMs will be terminated and vSphere HA will attempt to restart the VMs on hosts that still have connectivity to the datastore.

Datastore with APD

If a datastore fails with an all path down status, you can set the actions to take :

  • No action will be taken on the affected VMs.
  • No action will be taken on the affected VMs. Events will be generated.
  • A VM will be powered off, if HA determines the VM can be restarted on a different host.
  • A VM will be powered off, If HA determines the VM can be restarted on a different host, or if HA can not detect the resources on other hosts because of network connectivity loss (network partition).

VM Monitoring

VM monitoring is available after VMware tools is installed, in case of non-response via the tools (heartbeat signals) the virtual machine will be automatically restarted. Advanced configuration is possible for this feature (for example, reboot interval).

Admission control

vSphere HA uses the Admission control to ensure that sufficient resources are reserved for recovering virtual machines in the event of a host failure.

Admission control places constraints on resource use. Actions that may violate these constraints are not allowed. Actions that may not be allowed include the following examples:

  • Power on a virtual machine.
  • Migrating a VM.
  • Increased CPU or memory reserve of a virtual machine.

The basis of the vSphere Admission control is the number of host failures that the cluster is allowed to tolerate and that continues to ensure failover. The failover capacity of hosts can be defined in three different ways:

Heartbeat datastore

When the primary host of an HA cluster can not communicate with a subordinate host on the management network, the primary host uses the database heartbeat signal to determine if the subordinate host is failing, is in a network partition, or is isolated from the network.

Advanced options

Multiple advanced configuration settings can be used in your cluster.

You can find settings on this page.

HA rules

In the configuration section and then in the VM/Host Rules tab, you can create a rule of type "VMs".

This will add a reboot condition to ensure that all virtual machines in the first group are started before starting the ones in the second group.

This rule can be added to the configurable restart priorities in the VM Overrides.

Go further

Join our community of users on https://community.ovh.com/en/.


Did you find this guide useful?

Please feel free to give any suggestions in order to improve this documentation.

Whether your feedback is about images, content, or structure, please share it, so that we can improve it together.

Your support requests will not be processed via this form. To do this, please use the "Create a ticket" form.

Thank you. Your feedback has been received.


These guides might also interest you...

OVHcloud Community

Access your community space. Ask questions, search for information, post content, and interact with other OVHcloud Community members.

Discuss with the OVHcloud community

In accordance with the 2006/112/CE Directive, modified on 01/01/2015, prices exclude VAT. VAT may vary according to the customer's country of residence.