VMware fault tolerance – some v. quick best practices

A feature most vCenter/ESX admins would be familiar with would fault tolerance (FT).  Well described as its chief use is for VM’s that need to be up 100 % of the time.

FT creates and maintains an exact copy of a running virtual machine—secondary VM—on another host. Both VMs exchange heartbeats to monitor each other’s status.

Ensuring 100 percent uptime for critical VMs

When a host with a primary VM fails, the secondary VM becomes active almost instantly. A tiny delay will occur but not one that would cause service disruption. As everything that happens on the primary VM replays on the secondary one, this failover happens in the background without the interruption of the existing network connections. The replication delay is typically < 1 ms.


Requirements to enable FT include the following:

  • vSphere HA cluster with vMotion enabled
  • NFS/iscsi/SAN datastores and networks
  • Correct vSphere license
  • Virtual machines must have at least one CPU
  • Virtual machines must be stored on a shared storage available to all cluster hosts

Best practice tips

By its nature FT replication generates a lot of network traffic with the way it is setup. It is considered the best practice to have a separate network for FT heartbeats so that the replication does not impact other traffic. VMware recommends a 10g network for this purpose but at a push 1g will suffice depending on the nature of the application

Related to the above, due to the nature of the network traffic in FT it’s not recommended to have more than eight FT-protected virtual machines on one host. The replication traffic may saturate the FT-enabled network card.

Another good tip is to have at least three hosts in the cluster with FT-protected VMs. In case a host with primary or secondary VM fails, the VM can be recreated on the third host. This way, it stays protected after host failure, a pretty common sense measure.

There is some additional overhead from a memory and disk perspective with resources when using FT. The secondary VM reserves the same amount of memory as the primary one, so FT requires twice the memory but add a little extra for legroom were possible


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.