Managing a virtual data center and hundreds of virtual machines (VMs) is a lot like the old children’s tale of Goldilocks and the Three Bears: too hot, too cold, too firm, too soft, etc. At some point, you find one that is just right. However, with virtualization, we seem to be on the path of too big or too small; rarely is it just right. It is difficult to find a configuration that is just right until you’ve let a VM run long enough to know how it performs. The mistake many of us make is never coming back to the VM and adjusting to meet its actual performance needs.
How well do you really manage your VMs? Beyond the initial request from the application owners, do you ever go back and re-evaluate the size of a VM, or consider whether it should even be powered on? Chances are, you turn on a VM at the request of an application owner or developer and forget about it until someone complains. Fortunately, the hypervisor is a very forgiving resource that hides many bad practices, but only until we are faced with a mountain of problems do we ever notice. Too many CPU cores and too much RAM simply become unused resources, until we have overcrowded a server, and we start to see contention at multiple levels across the data center.
Traditionally, rightsizing has not been a simple skill. Understanding what a VM is really consuming, and what an application really needs requires deeper analytics than “the software vendor recommends.” Project timelines and executive management urgency often take priority over proper planning and vetting of VM size. Compounded by hundreds or thousands of VMs spinning-up and changing regularly, we suddenly realize we have created a beast that needs to be tamed.
We have all been in this situation:
Application Owner: “I need a Windows Server VM with 16 CPUs and 64GB or RAM.”
vSphere Admin: “That’s big. Are you sure you need that much memory, and do you really think you need 16 CPUs?”
Application Owner: “That’s what the software vendor recommends. They say they won’t stand behind their product unless we configure it exactly as they have tested it.”
vSphere Admin: “But do you really need all those resources? We may need a dedicated server just for you. It seems more likely they [the vendor] don’t know what their application needs.”
Application Owner: “I guess, but we don’t know for sure what we will be using yet. Luckily, it’s a virtual machine, we can always change it later.”
Six months later:
Application Owner: “The VM seems sluggish, can we get another 4 more vCPUs added to our VM to increase it to 20 CPUs? I think that may improve the response time.”
Now, Add CloudPhysics to the story:
vSphere Admin cringes but looks at CloudPhysics VM Rightsizing tool and sees the following :
HUGE-VM-01 – CPU peak @44%, 99th percentile @3%, vRAM peak 4.2GB,
CPU ready time 48,000ms, crosses NUMA node, status – idle
vSphere Admin: “The CloudPhysics Rightsizing tool indicates we should resize your VM to reduce contention and represent your actual usage characteristics. I can see here that your VM spends most of its time at 3% usage and has never peaked above 44%.
Maybe we should downsize. In fact, by looking at your CPU-ready values, I would guess your slow response time is the result of too many CPUs waiting for available resources. Let’s start there and see what happens.
We will learn several important lessons when we start to understand how rightsizing helps us in the data center. Here are some key things that you will start to understand after you have reached your peak capacity and nobody is happy with their performance.
- More is not better. CPU scheduling of large core VMs means that large VMs and potentially other VMs on the host will all suffer high CPU-ready values while the system tries to ensure everyone gets their CPU cycle time. If your VM doesn’t need more vCPUs, reduce the size and potentially lower your CPU-ready wait time.
- Multi-core VMs that span a NUMA node will take on CPU latency resulting in even further performance reduction. If you can fit it on a single processor, resize; else consider changing the cores-per-virtual-socket attributes of your VM.
- Memory compression, swapping, and swapping-to-disk for dense hosts is slow. Know how much memory your VMs really need.
- Idle VMs still consume scheduler time and memory
- Heavily used VMs, and VMs with high active memory or high CPU demands can often benefit from more resources. Know your application and investigate whether adding more resources can benefit your application.
Be sure to try out the new VM Rightsizing tool from CloudPhysics. You may be surprised that what you thought was ‘just right’ may in fact be too big or too small. Take a lesson from Goldilocks and find the VM size that is really just right.