Tuesday, 17 May 2016

Memory Reclamation-Ballooning PART4

Do check my previous articles on TPS and Compression in this series on VMware Memory reclamation, as compression will start after TPS and Ballooning.

Ballooning in simple terms is a process where the hypervisor reclaims memory back from the virtual machine. Ballooning gets initiated when the ESXi host is running out of physical memory. The demand of the virtual machine is too high for the host to handle.

But before I describe Ballooning, it is good idea to understand why we need to reclaim memory from Virtual machine.

In order to understand why reclamation? Let’s understand, how operating system manages memory allocation in a physical system. Below Diagram provides us idea on how the memory pages are handled by operating system.

For example, when I open MS outlook for the first time on my computer, it takes some amount of time to load all pages of that program. Now let’s just say, I closed the outlook, but after couple of minutes I tried to re-open outlook again, now I may not need to wait same amount of time, in fact this time it will be quicker. So what happened in back end?

Well, when I started application first time, it loaded all the required pages of that program into the memory which we call as Active Pages or MRU. But when I closed the application, memory pages of that application which were loaded into MRU are not deleted from memory, rather operating system keeps those pages back in LRU or Idle pages, considering application may require those pages if request comes in again like in my example I started application again.

Now this is really good approach of managing memory pages and ensuring performance by keeping pages in LRU. But this approach is good for physical systems. The challenge that we face in virtual machine due to this approach is as below.

  • Hypervisor has no visibility of Free list, LRU and MRU memory pages that are managed by Operating system of a virtual machine. 
  • So if multiple VMs are demanding memory resources and later keeping memory pages in LRU even after workload is no longer present, this results in unnecessary consumption of host memory of ESXi host which can cause memory contention when multiple VMs puts high demand for memory resources. 
  • On the other hand, operating system of virtual machine is also not aware that ESXi server is under memory contention as virtual machine operating system also does not have visibility of ESXi memory consumption and cannot detect the host’s memory shortage.
So to overcome host memory contention due to above mentioned issues, we use Ballooning reclamation technique. Balloon driver (VMMEMCTL) is loaded into the guest operating system when we install VMware tools.

In Figure (A), four guest physical pages are mapped in the host physical memory. Two of the pages are used by the guest application and the other two pages (marked by stars) are in the guest operating system free list. Note that since the hypervisor cannot identify the two pages in the guest free list, it cannot reclaim the host physical pages that are backing them. Assuming the hypervisor needs to reclaim two pages from the virtual machine, it will set the target balloon size to two pages. 

After obtaining the target balloon size, the balloon driver allocates two guest physical pages inside the virtual machine and pins them, as shown in Figure (B). Here, “pinning” is achieved through the guest operating system interface, which ensures that the pinned pages cannot be paged out to disk under any circumstances.

Once the memory is allocated, the balloon driver notifies the hypervisor the page numbers of the pinned guest physical memory so that the hypervisor can reclaim the host physical pages that are backing them. In Figure (B), RED and GREEN are representing these pages.

The hypervisor can safely reclaim this host physical memory because neither the balloon driver nor the guest operating system relies on the contents of these pages.

If any of these pages are re-accessed by the virtual machine for some reason, the hypervisor will treat it as normal virtual machine memory allocation and allocate a new host physical page for the virtual machine.

OK. Now the above description is as per the VMware Documentation. In order to understand this in simple terms lets discuss this same process further.

  • When ESXi host is under memory contention, ESXi host sets the target for balloon driver. 
  • As per the target, balloon driver inside virtual machine, will fake itself as another application and demand memory from Operating system of virtual machine. 
  • Considering the request from application (FAKE), VM operating system will start allocating memory pages to balloon driver from Free list, LRU and if required from MRU as well in case there is situation to satisfy reservation demand. 
  • As soon as balloon driver receives memory pages from operating system of VM, it starts inflating from its initial size just like what happens with actual balloon when we pump air into it. 
  • Memory pages that are consumed by balloon driver, are pinned (Red and Green pages in above figure) so that they are not swapped out. 
  • Balloon driver communicates with the hypervisor through a private channel and informs hypervisor about pinned pages. 
  • Hypervisor then reclaims these pages by setting up lower target, this causes balloon driver to deflate back to initial state, just like in actual balloon, if we take air out of it, it comes back to initial state. 
  • Below image describes this process graphically.
Image: VMware

ESXi host will try to reclaim memory from virtual machines as per target received. How much memory is reclaimed from each VM is calculated with the help of Memory Taxing (mem.idletax). 

Like if you earn more bucks, you pay more tax, so if any VM holding more number of idle memory, it is charged (Taxed) more. :P
If a virtual machine is not actively using all of its currently allocated memory, ESXi charges more for idle memory than for memory that is in use. 

I hope this clarifies the mystery around ballooning. 

Below is the list of articles in this series for further reading.

PART1: Run cycle of reclamation techniques
PART2: Mem.minfreepct and sliding scale method 
PART3: Transperent Page sharing 

PART5: VMware Memory Compression
PART6: Hypervisor Swapping and Host SSD Swap

1 comment:

Popular Posts This Week