Running the numbers on vSphere 6.7 Readiness – Upgrade, Reduce, Migrate and Save (even on VMware)

Imagine your largest customer comes to you and says, “Thanks for your years of hard work and great products and services, you did so well, we don’t need you anymore, at least not as much as we used to need you.”

That may be the phrase VMware is about to start hearing far more often, but not because customers don’t need them anymore, but because they can do so much more for less with new server hardware.

With the VMware vSphere 6.5 Readiness Assessment, we quickly learned a lot of customers still are running older legacy servers with an old version of VMware vSphere. Typically, this was a late 2000’s era server with vSphere 4.1 to vSphere 5.5. Frequently we found customers stretching every last GHz and GB of RAM out of the old systems. However, the problem is these systems cost a lot to run, offer low density and do not support the current VMware vSphere versions. This scenario is a VMware partner’s dream for refresh and professional services. It is also a cloud provider target audience.

But where is the greatest cost these businesses worry about? In reality, it was power, cooling, and VMware Licenses/Support. With many older CPUs offering only 8 cores, 4 cores, or less, the density achieved for the cost of a VMware vSphere license was significant. If you had 10+ servers in your organization older than 6 years old, you were at risk of spending more simply because you can fit fewer workloads per server than their modern counterparts. This fact paired with actual power requirements, cabling, and floor space added to higher cost due to lower densities.

Enter the strategic partners and trusted advisors offering VMware vSphere 6.5 and 6.7 Readiness Assessments. These partners are the heroes for finding major cost savings, offering hardware refreshes, and offering the skills to make a clean upgrade. However, these assessments started to show us an interesting trend in data center cost and potential savings. With vSphere 5.5 ending support in September 2018, many organizations were left looking for upgrades of the unsupported host if they wanted to move to the current hypervisor version. That would be a significant cost if they needed a one-to-one host replacement. Luckily, processor manufacturers like Intel have been following Moore’s Law and increasing density at a steady pace. Customers now are starting to look towards density, more cores, more threads, and significantly more system memory. What was once running on 8 to 12 physical servers could now easily fit on a single server blade compute node.

In addition, rather than a simple server refresh, we are seeing customers and partners plotting a resource migration to the cloud. Instance-based cloud services offer significant savings for On-Demand workloads. With the cloud’s low prices and dynamic discounts based on usage, many service providers and resellers have started to model significant cost savings for customer environments. The end result, fewer VM’s on fewer hosts with much greater density required in the datacenter.

Now, the real value starts to become visible when we look at a hardware refresh, cloud migration, and a license reduction. Time to get a little geeky and look at the numbers.

For this example, I am working with a customer who has a typical large production environment. They have 80 hosts with 166 CPU Sockets deployed in their datacenter. All of the hosts are running VMware vSphere Enterprise Plus v5.5. Of these 80 hosts, 68 hosts are not on the VMware vSphere 6.5 HCL list and would need replacement. Ouch. Luckily, their low core CPU’s and slow performance are ideal candidates for replacement and consolidation. This environment has 1282 physical CPU Cores offering a peak capacity of 3020 GHz. It also has 14.06 TB of Physical RAM. A nice size environment, until you look at the resource per host average.

In terms of the virtual environment, we see 825 VM’s. Despite being oversized and having an allocation of 5,540 GHz defined, they are only using 900 GHz of CPU at peak demand. This is a typical example of defining workloads larger than their actual need. The actual utilization is less than one-third of the data centers actual compute capacity. These workloads also use 8.03 TB of the 14.06 TB of physical RAM. In short, we are not over committing the environment in CPU demand or RAM.

A side note, we looked at bandwidth, IOPS, Throughput, and network configurations and found no limiting resource. The environment was simply oversized with low-density resources.

VMware should take note… This is changing.

These numbers are important. If we were to calculate how many modern hosts would be required with current processors, we find the terrifying facts VMware is most afraid of today. The new standard for high-end servers is far more cost effective than admins most realize. A new physical server with 2 Intel 8180 processors running 2.5 GHz and providing 56 CPU Cores and 1 TB of System RAM offers density at a low cost justifiable to most organizations. Looking at a few vendor models, we quickly see we need only 8 physical servers to meet the needs in this production data center. For redundancy and availability purposes, we scale up to 12 total hosts. This provides sufficient network IO and storage IO across the environment. Hosting all of this in two blade chassis also spreads the risk. In total, the entirety of the cluster can run on 24 Physical CPU Sockets. That was a reduction of 142 CPU sockets. We also see a 65% reduction in power consumed per year!

But wait, there’s more… for VMware, there may be some very bad news in these numbers.

What was once an Enterprise ELA for vSPhere Enterprise Plus for 166 CPU Sockets is now only 24 CPU Sockets. A reduction of 142 CPU’s. vSphere Enterprise Plus with 3 years Support and Subscription is costly. If an organization had not been keeping up on Support and Subscription, the catch-up cost would have been significant. Just Support and Subscription (SNS) per year support agreement would be substantial. What was once almost $1M for the first three years and approximately almost $140K per year thereafter can now be a fraction of the original cost. Even with ELA discounts and negotiated pricing for three or more years of support, the cost savings to the user would still be significant

When we find more than 20% of the workloads idle 80% of the time or more, we consider these are ideal candidates for migration to the cloud with On-Demand pricing. GCP On Demand means fewer workloads on premises. It also means fewer UPS draw and cooling needs. If we move 50 to 100 of these workloads to the cloud, we quickly find the cluster could potentially be reduced further. But, for High Availability purposes and future growth considerations, we will keep this environment at 12 hosts. This exercise should also look at dependencies before considerations, but the scenario allows for flexibility for development and testing. These costs can quickly be recovered in VMware license savings annually.

So, what do you do? The choice is obvious. Upgrade where you can, reduce VMware vSphere License, and migrate development to the cloud where it makes sense. All of these allow organizations to modernize and move closer to a true hybrid cloud.

This is the story VMware has been preaching for years, but now their dream has come true, and it may turn out to be their own nightmare.