Stay vHealthy! An Audit a Day Keeps Operational Hazards Away

FacebookTwitterGoogle+LinkedIn

All data centers are fraught with hidden Operational Hazards. These are ticking time bombs that are waiting to explode but very difficult to predict when and how exactly they will happen. It is like predicting when a patient with high cholesterol and blood pressure will experience a heart attack. Regular preventive care can reduce the risk of a heart attack, but unfortunately many data centers do not have proper preventive care. Administrators are constantly pushed to reactive fire fighting mode like doctors in the emergency care unit. This firefighting then becomes a vicious cycle as it supplants preventive care work which then leads to more firefighting.

It is a no-brainer that one should periodically run virtualization health checks. The problem is that it is not trivial. Data centers are complex and constantly changing. There are hundreds of things that can go wrong and any one of them could become a ticking time bomb. One needs lot of time and expertise to separate the noise from critical issues that needs attention. People combat this today by

  • Enforcing best practice guidelines through strict internal processes – this is tedious, requires lot of domain expertise, unreliable in dynamic environments and not comprehensive
  • Hiring professional engagement service to run health check audits – this is expensive, and there is always operational hazard exposure between the audit periods
  • Using free to use scripts – this is manual, and becomes outdated as new issues are discovered

Clearly none of these approaches are effective. A new paradigm is needed, and that is exactly what we are doing at CloudPhysics. First, CloudPhysics continuously monitors your environment and therefore is aware of all changes happening in your infrastructure. Secondly, as a SaaS platform CloudPhysics continuously adds and updates health check rules which become immediately available to you. And thirdly, using the global dataset as a guidance CloudPhysics can report outliers in your metrics. This has done in many other fields but surprisingly not much in data centers. For instance going back to the medical analogy example, medical researchers rely on thousands of patient records to determine what is the safe/unsafe level for blood pressure and cholesterol and use that to guide patients.

What kind of health checks can you do today using CloudPhysics cards? Here are some examples from our customers [with links to the Cards]:

Daily Audits:

Weekly Audits:

Monthly Audits:

Audits Before and After Hardware/ESX Version Upgrade:

  • Compatibility of your physical host with a particular version of ESX [Host Inventory]
  • Compatibility of your PCI I/O Devices with a particular version of ESX [PCI I/O Devices]
  • NTP settings of all your ESX hosts [Host NTP Settings]
  • Which KB articles are relevant to your infrastructure after a hardware/software upgrade [Knowledge Base Advisor]

In addition to these audits you can also create your own custom health check report with a few simple clicks using the CloudPhysics Card Builder platform. Moreover, there are additional health check cards in our Card Store built by our community members.

We are not stopping there, today I’m happy to announce couple of new feature enhancements. We are adding summarized notes and recommendations in plain english for every issue that we discover.

Notes and Recommendations

We are constantly adding new and interesting rules that catches issues at the early stage, so keep an eye on the notes and recommendations section.

We recognize just surfacing the issue is not sufficient, you also need to know which ones to prioritize, so we auto-classify the identified issues into two categories; those that need attention and those that are noteworthy and we nicely summarize this for every object across your datacenter.

Need-Attention-Noteworth-tabs

And to make it simple to drill down to the identified issues, we now expose the identified issue to the card face. So now with one quick glance you can tell you how many issues needs your attention.

Virtualization health check issues

I hope you find these enhancements very useful. All of these features is now available with our Storage Analytics product. You can request a demo here.