Sockets, Cores, and Threads – When Performance Modeling, 1 + 1 is not Always 2

Capacity planning and modeling requires a firm understanding of the resources you are using.  Planning capacity solely on workload-defined resources will often get you in trouble and lead to contention or waste. A good understanding of the processor and your workloads can ensure you maximize your resources.  

When helping customers onboard to the new CloudPhysics Dashboards, we often discuss CPU contention and point out workloads with High CPU Ready/CPU Co-Stop. These metrics are most common in hosts that are oversubscribed and often the result of too many large and heavily used VMs on a single host. One aspect we often find is the idea that VMs can have more logical cores than are physically available in a host. In other words, I can have a 32 vCPU VM on a host with only two CPU sockets of eight CPU cores and a total of 16 physical cores in the server. vSphere may be perfectly fine with this configuration, but at some point, this can become the Achilles heel to your virtualization performance unless you can find the right balance or demand, threads, and avoid contention.

For example, when looking at the physical socket layer in our hosts, we typically see two physical CPU sockets. Each CPU socket in our example might be eight physical cores and present to us a total of 16 physical CPU cores in a system (2 CPU x 8 cores). However, hyper-threading adds an additional layer of compute complexity to the model in that each CPU socket can run two threads. This allows for multithreaded applications to take advantage of dual path execution and execute two simultaneous threads within the OS. To the guest OS, and even the hypervisor, we see the potential for 32 logical CPU cores available (‘2 CPU’ x ‘8 cores/CPU’ x ‘2 threads per core’). So far, so good.

A CPU core with multiple threads offer us a distinct advantage when we have a guest OS and application that take advantage of multiple threads and parallel execution of tasks. This hyper-threading can often offer between 10% to 30% performance increase in the executed task, but the work delivered is not the same as two physical CPU cores. When applications benefit from the extra threading by offloading contentious CPU instruction, we see more than 100% efficiency from a single CPU core. This is ideal for contentious instructions. Rather than executing two tasks one after another, two tasks are executed simultaneously. However, this is not a two-for-one deal. We do not see a 100% increase in performance since the execution of the hyperthread is only part of the executed instruction and not the the entirety of the executed task.

Now, what happens when we create a 32 vCPU VM on our 16 physical core host? That depends. vSphere says I have 32 logical cores available, but logical is not what your workloads run upon. Your VMs need physical cores. Threads offer additional work threads in a guest OS where multithreading applications can use the resources. However, I created a 2 vCPU VM and expect it to run on the logical cores presented by the hyper-threading alone. The workloads still require a physical CPU core. Remember, our example host still only has 16 physical cores to carry out our execution and the threads offer us a performance increase in terms of extra threads, but threads are not equal to cores.

Now, vSphere and the hypervisor do us a favor here. It can schedule CPUs for us and if the demand is low and the resources are available, the hypervisor will give a CPU core to the workloads that need a core. If my demand is less than the resources available, I can give my VM all the capacity it needs until I hit the point where my CPU demand is greater than the resources available from my physical 16 cores. Threads or not, vSphere is scheduling the workloads and ensuring each CPU request goes fulfilled if it has the resources to give out.

This is all good while the CPU usage % in a host is low, but as usage increases, the number of VMs increase, or the CPU demand approaches the host maximum threshold. You start to find that those extra logical cores we thought would give us a boost now are the point of contention in our host. Scheduling 32 vCPUs on a 16 pCPU host that is running at near 100% results in extreme CPU Ready and CPU Co-Stop values. When we have dozens of large idle workloads fighting for CPU cycles, our logical vCPUs will fight for every resource they can get. Since our threads are not actual CPU cores, we are now fighting for resources at the host level.

Some strategies have been to create a single large VMs that consume the entirety of a server but this still may not be a good idea.  The scheduler and hypervisor need their resources as well.  Consider services like vSAN, NSX, and performance management of vCenter and we are again in the realm of oversubscribing the server’s resources. Rather than 100%, find out what demand your host requires to deliver upon its own service and exclude these resources from host estimates.

Planning for capacity and your largest VMs is both science and art. Finding a balance between demand and resources available will likely determine if oversubscribing CPUs and super-sizing your workloads will have a positive or negative impact. Knowing the total available CPU demand available by your host and the demand required by your workload will set you on the right path– but managing the balance of multiple VMs on a host or cluster will result in never knowing if your workloads are performing as designed or are bottlenecked by their own size.

Before you start your cloud migration or data center rightsizing effort, take advantage of the CloudPhysics Rightsizing and tool to understand both the demand and usage of your VMs in relationship to your defined compute resources.  

Learn More

RELATED: On a side note for not using hyper-threading, we learned in mid-June 2017 that there is a major bug in the hyper-threading instructions of Intel Kaby Lake and Skylake CPUs. The flaw destabilizes the CPU and results in “unpredictable system behavior”. Corrupt data, incorrect calculations, and hardware/OS failure are all potential outcomes. So, take caution and ensure your chipset is up to date if you are experiencing anomalies in your execution.