The VMTurbo Workload View: The Workload Summary

Discussion created by tcoz on Oct 2, 2014

In another post, I covered the VMTurbo Workload View's Workload Chart:
The VMTurbo Workload View and Chart


This time we'll focus on another one that I get a lot of questions on; The Workload Summary. It's the portion of the Workload View on the right of the screen. Attached at the bottom is a screenshot you can use for reference (the workload chart is on the left, the workload summary is on the right).


The things that most people ask about, are the numbers. I know this is a read, but there's value in those digits. Note that this is a lab environment that is by design underutilized and has some experimental configurations, but it still gets the point across.


The three numbers along the top, Action Factor, Risks, and Efficiency, summarize the overall result of executing all currently available actions. There is one "big" number, and then some smaller numbers under each.


Action Factor (0 - 100):


You see in the screenshot, I have an Action Factor of 35 out of 100. What's that mean?


Action Factor is (more or less) a rebrand of what we formerly called Convergence in this spot. What Action Factor indicates, is how much "work" needs to be done, in terms of Actions, to move your environment from the Current to the Improved state. Note the word "Improved", not "Desired", because the Desired state is achieved by going through a number of Improved states. In a way, you can think of it as a car navigator; each turn is an Improved state that gets you closer to the Desired destination. The analogy isn't perfect because the Desired state is actually always moving (what was thought to be ideal today may not be ideal tomorrow), so it would be more like a navigator with a destination that can (and probably will) change.


Each Action in the current set (again, which is all the current executable recommendations) has two basic characteristics that are used to compute how heavily it effects the overall Action Factor. First, that the Action was generated at all. Second, how important that Action is. A VM Move for example, could either be a tweak, or it could resolve a severe congestion problem. The more important the Action, the more it weights Action Factor. So, a very high Action Factor, means you have a lot of Actions that are heavily weighted, meaning your environment needs a lot of adjustment. A very low Action Factor generally means a few actions with very low weight.


How can you see the effect? Use the toggle on the top of the Workload Summary: "Workload Chart (Improved)". This will reveal another Workload Chart that shows you what the Workload is expected to look like after executing all current Actions; the Improved State. You now get a side-by-side view of Current and Improved states. The higher the Action Factor, the more difference there should be between the two charts. A very low Action Factor and you might not see much difference.


The number underneath Risk Factor, is the pre-normalized "big" number above it. This number can be used if you want a more precise measure than 0 - 100.


Efficiency (VM Density):


A basic reference measure; you have X hosts, Y VMs. The more VMs per host, provided the environment remains healthy, the more efficient you're running. You probably have an idea of what makes sense in terms of VM Density for your environment; this gives you that info. The percentage is just the percent diff between the two numbers underneath it.




Deliberately left for last, Risks tells you how many Risk Factors can be removed from your environment if you execute the current Action Set (all current recommendations). All the data below in the Workload Distribution section, supports the Risks number.


You see here, "-1", so one Risk Factor has been removed, and the "Current (1)" and "Pending (0)" numbers underneath show the diff. But what's that mean?


(Also note, when I say "critical", "normal", "underutilized", I'm talking in terms of Utilization Index, which you can read about elsewhere, leave a comment if you can't find that info).


One thing to remember about what VMTurbo does, is we watch everything, all the time. We don't just tell you, "PMs look good so your environment is fine." Using the Workload Chart, I've shown many times how you can have all normal PMs, but they are hosting VMs that are in trouble (misconfigured, etc.). Not to mention, the VMs running on those happy hosts might be using storage that is critical. So the PM will chug happily along as your VM is about to be unable to write data.


So, a VM, is actually involved in a basic three-way relationship. Itself, the Host, and the Storage it uses. Each element of that relationship represents a Risk Factor. So, the total risk factors in a basic Workload environment, is "Number of VMs x 3".


Examples; A VM is critical. The host it's running on, and the storage it uses, are not. That is one risk factor. If the VM is critical and the host is critical but the storage is fine, that's two risk factors. If all three are critical, that's three, if none are critical, that's zero.


So in the screenshot, you see one Risk Factor was removed somewhere...but which one?


In the "Workload Distribution" area, there are three sections: Current State, Improved State, and Changes. There are four columns; VMs, VMs on Hosts, VMs on Storage, and Totals.


You would read the rows and columns like this:


"In the Current State section, I am running 18VMs, 14 hosts, and 10 storage entities."

"In the Current State section, I have 1 Critical VM, no VMs running on critical hosts, no VMs running on Critical Storage, resulting in a total of 1 one Risk Factor."

"In the Current State section, I see that the Totals column, 32 + 21 + 1 represents 54 risk factors (18VMs x 3)."


"In the Improved State section, I am running 18VMs, but 8 hosts, and 8 storage (probably recommendations to suspend, etc for efficiency).
"In the Improved State section, I have 0 critical VMs, no VMs running on critical hosts, and no VMs running on critical storage, resulting in 0 Risk Factors.

"In the Improved State section, I see that the Totals column, 35 + 19 represents 54 risk factors (18VMs x 3)."


"In the Changes section, I see that I have reduced critical VMs by one, and everything else in the Critical row is 0, resulting in a change of -1 Risk Factor".


(Aside: Which VM? Look at the Workload Chart. You see that one red circle; that's where your critical VM is, which in turn, you can see is running on a host that's doing fine, but on underutilized storage. If you draw a square around that circle, you'll get the details, the related actions, and so on).


Underutilized is also interesting:


"In the Current State section, I have 16 underutilized VMs, 8 VMs running on underutilized hosts, 8 VMs running on underutlized Storage, resulting in a total of 32 Risk Factors."
"In the Improved State section, I have 16 underutilized VMs, 11 VMs running on underutilized hosts, 8 VMs running on underutlized Storage, resulting in a total of 35 Risk Factors."

"I see that the Totals columns in both add up to VMs x 3 ( = 18 x 3 = 54).

"So in the Changes, I see I am running 3 more VMs on underutilized hosts."


Again, our lab environment can do interesting things. If this was a production environment, I'd want to know why we are running > 50% of the environment in an underutilized money money.


What's the ideal?


"In the Normal row for both Current and Improved states, I see 18 VMs are Normal, 18 VMs are running on normal Hosts, 18 VMs are running on normal Storage. All 54 Risk Factors are in the Normal row."


I hope that gets you started using the Workload Summary. Feel free to post follow up questions, ideas to enhance it, and so on.