How to optimize Virtual machine vNIC transmit (Tx) performance?

In this post, we will discuss about using single or multiple vNICS on a VM and vmkernel networking CPU threads to optimize transmission rate of VM traffic.

We can create a VM with single vNIC or multiple vNICs depending upon operational requirements. Using multiple vNICS on a VM could be for different types of traffic or just for higher throughput and redundancy purpose.

Regardless how many vNICs (Single or multiple) are assigned to a VM, it uses single vmkernel networking thread while transmitting the traffic. It is better understood with an illustration of this as in image below.

Single Tx Thread

As network packets are transmitted from the a VM towards the Physical NIC layer via the VMkernel, a single CPU thread is used to process all transmit traffic coming from a VM, regardless of how many vNICs are assigned to the VM.

Let’s check this with a demo here. I have a VM named app-01a with below configuration.

As you can see, this VM has only single network card. We can see the details about vmkernel threads being used by a VM using ESXTOP.  VMkernel threads for a VM are shown in ESXTOP in the CPU view (expanded to specific VM level) as NetWorld-VM-XXX.

So If I check ESXTOP now, I should be able to see single vmkernel thread being used by the VM as in screenshot below.

Now let’s test this concept with multiple vNICs on the VM, let’s add a network adapter to this VM and check it again. I have connected additional adapter to the same network in this demo.

After adding the second network adapter, if I check ESXTOP again, I am not able to see any additional threads being created for this VM as in screenshot below. It still shows single thread being used for multiple network adapters as in image below.

So basically, giving multiple vNICs to a VM may result in performance enhancement to some extent, but not necessarily to the level as you might have anticipated. So how do we change this?

For VMs that require a higher transmit rate and multiple simultaneous TCP streams, multiple CPU threads can be used per vNIC. Up to eight threads can be used by the vNIC, depending on the number of streams.

How to configure CPU threads for vNIC?

This feature can be enabled on per VM level by editing VMX (EthernetX.ctxPerDev=“#”) or on host level (Net.NetVMTxType = ‘#’). Here, # represents the value to be used as described below.

Acceptable values for Net.NetVMTxType and EthernetX.ctxPerDev are 1, 2, or 3 where,

  • Setting it to 1 results in 1 CPU thread per vNIC
  • It is set to 2 as Default value which means 1 transmit CPU thread per VM
  • Setting it to 3 results in 2 to 8 CPU threads per vNIC

For this demonstration purpose, I will use Net.NetVMTxType = 1 on the ESXi host that is running app-01a VM and verify the change in number of vmkernel threads being used.

After making the change on ESXi host, let’s verify the same at VM level by running ESXTOP.

Setting Net.NetVMTxType = 1 should add two vmkernel threads (Networld-VM-XXXX) since we set it to one vmkernel thread per vNIC.

As you can see, there are two threads listed for app-01a VM now. Additional vmkernel networking thread appears as NetWorld-Dev-<id>. This arrangement helps boost transmit rates for the VM when more than one vNIC is configured.

So we can conclude that, we can configure number of vmkernel network threads that can be used per vNIC. Depending on Net.NetVMTxType and EthernetX.ctxPerDev values used, we can achieve single thread per vNIC or multiple threads per vNIC as illustrated in below imgage.

Wrapping UP:

Using multiple vmkernel threads can enhance network performance on per vNIC. However, this feature requires enough CPU threads to be available on an ESXi host. If CPU on an ESXi host is already overcommitted, implementing this features might make matters worse. Adding CPU threads to process traffic flows requires that your CPU be under provisioned to ensure that network processing does not encounter contention on the CPUs.

Leave a Reply