What changed? Why is VMware running slow?

Neo: Whoa. Déjà vu.
[Everyone freezes right in their tracks]
Trinity: What did you just say?
Neo: Nothing. Just had a little déjà vu.
Trinity: What did you see?
Cypher: What happened?
Neo: A black cat went past us, and then another that looked just like it.
Trinity: How much like it? Was it the same cat?
Neo: It might have been. I’m not sure.
Morpheus: Switch! Apoc!
Neo: What is it?
Trinity: A déjà vu is usually a glitch in the Matrix. It happens when they change something.

The Black Cat

You’ve seen the black cat. It crosses your path too frequently. It’s called an application disruption. We’re not talking about career-ending outages – we’re talking about virtualized apps seeing increased latencies, impacting end-user experience, business processes, real transactions and money.

So you get the call: “My app is slow.” “Why is VMware running slow?” “What did you change?”

Your first instinct is “nothing changed.” But the creeping sensation that something changed is the black cat déjà vu – how can you be sure that this is not another undetected change causing the disruption?

It’s not that change is unexpected. After all, you designed, implemented, and manage vSphere because it can accommodate unexpected and different application demands and still deliver predictable behavior. However, that is not operational reality. What’s unexpected are the unseen impacts of changes to the vSphere structure and its configuration, or changes in the demands of the applications you support.

Because change in a vSphere environment defines its dynamic nature and your daily experience, we are introducing a new set of capabilities on our SaaS platform that enable you to deal with the black cat – giving you the ability to rapidly identify the direction and source of emerging and existing disruptions.

Context + Correlation = Causation

Finding the right context to search for changes over time is key to heading in the right direction at the start. Too often, the pressurized exploration for the cause of an app disruption leads down expensive dead-ends (and war rooms) because a spike in utilization seen in a monitoring tool on a resource like CPU or network load triggers a time-consuming random path through vCenter or other management apps.

tiled-view-cropped

Typical tiled two-dimensional view

 

At CloudPhysics we fundamentally believe this set of tiled two-dimensional views inhibits and confuses your ability to find cause. To get there fast you need context. And context requires domain expert choices on what to look at (given the type of disruption) with the ability to explore that context over time. Our new set of dashboard and exploration services deliver exactly this in rich views that take you into all dimensions rapidly and simply to find cause:

exploration_mode

CloudPhysics Exploration Mode

In this example, exploration mode gives us context to determine whether a VM is CPU-bound:  CPU Ready Time vs. CPU Demand. Ready time is the view of the VM indicating it has work ready to go; CPU demand is the view of the physical host on which multiple VMs are running and demanding CPU resources. With CPU ready time peaking with CPU demand you’ll want to explore this context over time, to understand duration and severity and isolate the disruption to a CPU resource shortage.  CloudPhysics exploration mode provides you the ability to expand and narrow the focus of time for this context.

Exploration Mode:  The Fastest Path to Cause

But wait there’s more. Just because you’ve isolated the context and exhibition of the symptom – a resource shortage – you are not headed towards cause yet. For that you need more data to correlate other changes that may have preceded the resource hit. Let’s look again at the exploration mode above – notice the other information available to you in the changes and structural issues elements located above the time series chart. CloudPhysics takes these and aligns them along the same time-series axis you are exploring – producing a fully correlated view across multiple sets of data. In this screen, we see a detailed change log containing events associated with configuration and performance details for this particular VM we’re viewing in context. At the same time (literally) you can see issues to the right populated with known problems and events that have occurred on this VM or its dependencies. In our example above, your attention is drawn to a red event indicating a network connection problem.

In this one view, across three correlated dataset panes, we can initiate an immediate action to get to cause: a network connection probably is producing new traffic load or a failover process causing increased CPU load, and backing up access to CPU resources for the VMs on this host. One view in the CloudPhysics exploration mode, and you are set in the right direction and engaging in the best possible approach to using data out of vSphere to your advantage.

A Proven, Encapsulated Three-Step Technique

The approach encapsulated in our exploration follows a process drawn from a great reference:  Effective Monitoring and Alerting by Slawek Ligus (O’Reilly Press):

  • Step 1: Find correlation – with the ability to set up different datasets and align them in time
  • Step 2: Establish direction – what is the order of events and changes? Discovering this order strengthens correlation heavily and shortens the path to finding cause significantly. CloudPhysics enables the exploration of time and this ordering across multiple dimensions and contexts. At this point you are truly operating vSphere as a data-driven admin.
  • Step 3: Itemize and rule out confounding factors – a not so incidental benefit of the our exploration mode includes an instant listing of possible paths to explore while excluding many you no longer need to investigate as potential sources of trouble. This shortens the overall path to finding cause.

Discover What Change Looks Like

We’ll be at VMworld this week showcasing these exciting new capabilities. Come by booth #2346 to review our contextual dashboards and exploration mode. Or if you’re not at VMworld, you can request a demo here.

Either way, we’d love to walk through our SaaS platform to see how we can address the black cats when they cross your path.