This post will be focussing specifically on working of CPU scheduler in Default, SCA v1 and SCA v2 modes in 6.7 U2 and later. VMware vSphere 6.7 U2 added new scheduler options (SCA v2) which provides security for L1TF vulnerability, while retaining as much performance as possible.
Note: SCA is the abbreviation for Side Channel Aware scheduler.
Before we dive into CPU scheduler modes in details, let’s first of all go through basics of CPU scheduler quickly.
Modern processors are equipped with multiple cores per processor. Furthermore, it is possible that you may have multiple sockets in the system that adds up more to the processing capacity. In such large systems, allocating CPU resources efficiently and fairly is a critical process.
What is CPU Scheduler?
The role of the CPU scheduler is to assign execution contexts to physical processors to ensure objectives such as responsiveness, throughput, and usage.
On conventional operating systems such as windows OS, the execution context is represented by a process. Whereas, on an ESXi host, the execution context corresponds to a world.
A VM running on an ESXi host is a collection of worlds such as:
- One world per vCPU (example, vmx-vcpu-0)
- One world for SVGA device (vmx-svga)
- One world for keyboard, mouse and screen (vmx-ms)
There are Non-virtual machine worlds as well which exist in an ESXi host. These non-virtual machine worlds are referred as VMkernel worlds. VMkernel worlds (such as vmotionServer world) are used to perform system tasks on an ESXi host. As you can see in below image, for VM App01a, there are multiple worlds listed in ESXi host. For example,
- Networld-VM-XXXX for vNIC
- vmx-svga:app01a for svga device
- vmx-mks:app01a for Mouse, Keyboard and Screen
- vmx-vcpu-0:app01a and vmx-vcpu-1:app01a for each of the vCPU
CPU Scheduler is responsible for choosing which world is scheduled to a processor (LCPU).
- CPU Scheduler uses algorithms such as Proportional-Share based Algorithm and Relaxed Co-scheduling for scheduling.
- Relaxed Co-scheduling is used for multi-vCPU (2 or more vCPU) VMs. These multi vCPU VMs are also known as SMP or Symmetric Multi Processing VMs.
- Every 2-30 milliseconds, CPU scheduler checks physical CPU usage and migrates vCPUs from one core to another or one socket to another if necessary.
- Default time slice is of 50 milliseconds which means CPU scheduler allows a vCPU to run on LCPU for max 50 ms.
- CPU Scheduler is fully aware of NUMA architecture. Hence NUMA, Wide NUMA and vNUMA are taken into consideration while scheduling.
CPU Scheduler Modes
In August of 2018, a security vulnerability known as L1TF (concurrent-context attack vector) affected systems using Intel processors. Intel provided micro-code updates for its processors, operating system patches were made available. VMware also provided an update in security advisory VMSA-2018-0020 for ESXi known as Side-Channel Aware Scheduler (SCA v1). In vSphere 6.7 U2, the side-channel aware scheduler has been enhanced with a new policy known as Side-Channel Aware Scheduler (SCA v2).
So basically, CPU scheduler in vSphere 6.7 U2 and later can be configured in 3 modes as listed below.
- Default Scheduler
- SCA v1
- SCA v2
Default Scheduler Mode:
In this mode, VMs with multiple vCPUs, vCPUs are scheduled across the physical cores. Multiple vCPUs of a VM are not scheduled on same physical core. As we all know, one physical core can handle 32 vCPUs (It is a maximum does not mean you should do 🙂 ), however, this is true for vCPUs of different VMs in this mode.
As you can see in below image, two VMs with 4 vCPUs each running on an ESXi host that has quad core physical CPU without hyper threading will be scheduled with time slicing as below.
If above example is considered in host with hyper threading enabled, still similar approach of scheduling is used to schedule vCPUs of a VM across physical cores as shown in image below.
This default scheduling policy retains full performance and is used when advanced settings hyperthreadingMitigation = FALSE and HyperthreadingMitigationIntraVM = N/A. Though this mode is performance oriented, it has no security awareness.
As you can see in above image, different VMs are sharing the physical CPU simaltaneously which raises security concern. A VM running on one thread of a core can observe information used by the other thread of another VM on that core.
Default Scheduler mode uses host security boundary model. This model is designed to prevent information leakage between hosts. However this model allows information leakage between VMs running on a given host. VMs running on the host are considered to be part of the same information security boundary. This model security boundary can be illustrated as in image below.
Using L1TF concurrent-context attack vector vulnerability, a VM on the host can access any information on that ESXi host. All credentials, crypto keys, and secrets used by the hypervisor or other VMs on the host could be obtained by a malicious guest.
SCA v1 Mode:
Side Channel Aware scheduler or SCA v1 mode can be used by enabling the advanced settings as:
- hyperthreadingMitigation = TRUE
- hyperthreadingMitigationIntraVM = TRUE.
This mode uses Process security boundary which ensures information is not exposed across different processes within a guest itself. This mode provides the highest security, however it also has the lowest performance of all possible modes since not all Hyper threads are used in this mode. Below image illustrates the concept of Process security boundary model.
How does this impact scheduling?
Let’s assume a scenarios here of 2 VMs VM-1 and VM-2 each with 4 vCPUs running on an ESXi host that has a single quad core CPU with hyper threading enabled. Now say, we powered on both the VMs on the ESXi. One of the VM will get scheduled first and after time slicing, other VM will start execution.
If you notice, from each core only one thread is used (HT0) and other thread (HT1) is idle. HT1 thread is not used even if second VM has demand and waiting for getting scheduled. So once the first VM time slice occurs, second VM will get scheduled as shown below. HT1 thread is in idle state again.
With this model, two worlds of the same VM has no access to each others information. One thread is always idle which results in reduced performance.
SCA v2 Mode:
SCA v2 mode can be enabled by setting below advanced options.
- hyperthreadingMitigation = TRUE
- hyperthreadingMitigationIntraVM = FALSE.
SCA v2 uses VM security boundary in which VM is treated as security boundary. Below image illustrates VM security boundary.
This mode provides a balance of performance and security for environments where the VM is considered the information security boundary.
Unlike SCA v1 mode, SCA v2 mode uses both the threads of CPU. Also, unlike Default scheduler mode, SCA v2 allows usage of both CPU threads by the worlds of same VM.
So if you take previous scenario again, where we have two VMs as VM-1 and VM-2, each with 4 vCPUs each, may get scheduled as below.
As you can see, both the threads are being used. However, it is ensured that both threads are used by same VM. CPU threads are not shared between VMs due VM security boundary model.
vSphere 6.7 U2 and later will use the default scheduler unless configured to use one of the other options. In order to use SCAv1 or SCAv2, a couple of advanced options must be set and the host rebooted for changes to take effect.
The security vulnerabilities with L1TF can now be mitigated safely with the new SCA v2 option available in vSphere 6.7 U2 and later instead of SCA v1 which has performance trade-off.
SCA v2 allows the hyper-threads to be used only by scheduling vCPUs from the same VM on the same core, and this regains some of the performance lost due to mitigations made necessary initially.
Options for scheduler policy configurations:
Default Scheduler – No security for L1TF security vulnerability.
- HyperthreadingMitigation = FALSE
- HyperthreadingMitigationIntraVM = N/A
SCA v1 – Auto Hyper threading set to off
- HyperthreadingMitigation = TRUE
- HyperthreadingMitigationIntraVM = TRUE
- HyperthreadingMitigation = TRUE
- HyperthreadingMitigationIntraVM = FALSE
You can use esxcli as well to set these options as shown here:
esxcli system settings kernel set -s hyperthreadingMitigation -v TRUE esxcli system settings kernel set -s hyperthreadingMitigationIntraVM -v FALSE
For more details and Q & A, check VMware Blogs.