The Question

The Question: Neo and Trinity

“It’s the question, Neo. It’s the question that drives us. It’s the question that brought you here. You know the question, just as I did.” — The Matrix

Great companies start from simple problems

It began with a simple question: “How many virtual machines can sit on my datastore?”

And so CloudPhysics was born.

As the storage product manager at VMware, I was asked that question on a weekly basis for more than six years.  My best and most consistent response: “I don’t know or it depends…and whatever I say will change as you provision new VMs, increase consolidation, VMotion things around, and modify physical infrastructure.”  There are so many dimensions to the problem — workload characteristics, type of storage, the relationship of CPU to memory to network to the storage…no product or person held the answer.  Or even an approach.

42 is not the answer

Disgruntled, my customers rightfully flayed me: “But John, you’re the product manager for this mission-critical stack — why don’t you know this?” I could only point to patterns in other customer environments, generate sophisticated spreadsheets, and write up best practice PDFs. While there was nothing resembling real engineering in my responses, I diligently followed up to see what customers did without the answer to this fundamental, simple question. One flagship customer – a leading practitioner in the design and operation of virtualized infrastructures – summed up his approach: “I keep adding VMs until things fall over.  Then I back off by 33%. I guess, change, and pray.” The more research I conducted the more the answer seemed to lie out there, mystical and elusive.

I wasn’t ready to punt on the problem and declare a magical number like 42. Without adequate answers, customers incurred growing and oftentimes unseen waste and risk — undermining the original reasons for getting into virtualization!  It killed me. Knowing hard industrial processes did not operate in this manner, I began looking at the design and operation of other mission-critical systems.

Where is my region of safety?

Bridge construction, oil refinement, and airport resource management all yielded enlightening lessons. Load factors are determined before a bridge is ever constructed; bridge engineers don’t build the bridge and then keep adding cars until it falls over. The margin of safety is well understood before any materials are ordered from manufacturing. By contrast, in virtualization management the industrial engineering phrases “margin of safety,” “load factor,” and “region of stability” are not part of our dialogue or mindset. Consequently, commercial computing environments are operated with surprising levels of waste and risk  — our industry-wide dataset today indicates average CPU utilization at 10% but that’s a different blog.

My point is that this waste defeats the very economic reasons for using virtualization in the first place. So what makes for a mature industrial-grade process? One that is safe, predictable, and efficient? Common to them all is the application of modeling, simulation, and analysis in every phase of the system:  design, architecture, procurement, configuration, change management, monitoring, troubleshooting, and optimization.

Why we know more about potato chip location placement in grocery stores than VM placement and consolidation

Entire industries and markets rely on the continuous generation of models to function – in fact, multi-billion dollar public companies operate platforms that gather and process massive amounts of industry-wide operational data. These enormous datasets drive collective intelligence and form the basis of model generation and verification essential to industrial-grade processes and systems.  Health-care has IMS (Now IQVIA). Media has Nielsen.  Retail food processing has SymphonyIRI. Every industry has leveraged big data and IT to continuously discover and redefine how it operates with safety, predictability, and efficiency – except IT itself.

This is a tremendous irony – and drives the core mission of CloudPhysics: to deliver the same deep understanding of virtualized systems to help users get more for less, safely.