eva.tuczai

Workload Rightsizing and Turbonomic

Blog Post created by eva.tuczai on Feb 14, 2015

Workload Rightsizing using the Turbonomic Economic Scheduling Engine

 

Overview

Turbonomic Software-Defined Control (SDC) system controls the virtualized data center in a desired state, a state in which application performance is assured while the underlying infrastructure is utilized as efficiently as possible.

 

Turbonomic SDC continuously analyzes the state of the environment and drives a broad set of actions including VM resizing to control the environment in the desired state. The desired state is driven based on the performance metrics across the entire virtualized IT stack and data centers, taking into account all configuration and business constraints and workload policies.

 

Turbonomic SDC defines rightsizing actions when the workload resources configurations need to be change in order to control the environment in the desired state. Given the definition of the desired state, the workload configuration is impacted, not only by its own resources, such as vMem or vCPU, but also by the resources of the underlying hosts and data stores, the other workloads and hosts and data stores in the environment.

 

Analysis

Turbonomic SDC analyzes the state of the workload resources based on:

 

  • Real time and historical performance and utilization of Data Centers, Hosts, VMs, Data stores, Applications (Guest Workload), including peak utilization values. The historical values are rolling averages.
  • Peaks and historical values carry a weight to assure workload is not downsized below peaks, nor do we recommend a reconfiguration of a VM with little data. Current values account for 50% of the weighted factor, historical are 49% and peak values account for 1%.  Note that a rightsizing action will not go below a peak value even if the Desired State is significantly below the peak.
  • The topological dependencies of the virtualized infrastructure (e.g. VMs on hosts, data stores), and the utilization of the environment.  Hosts with low utilization have plenty of resources so VMs running on them may resize conservatively (smaller increments), or may not resize at all.  When Host are highly utilized or not meeting HA requirements, and VMs need to resize up, there may be no available resources, so the first action may be a Host Provision action for more compute.
  • The rate of resize in compliance with your business and change management process (Low means reach the desired state in conservative increments, where High means reach the Desired State in one step.  Medium is getting to the Desired State in several steps.)
  • The configuration and business constraints.
  • The workload placement policies.

 

Configuration

Rates of Resizing may be adjusted by going to the Policy tab in the Analysis section to review the global settings available, with the defaults shown in the attached screenshot.  Note changes to these policies will require 24 hours before resizing down actions will take the adjusted factors into account.

 

The options available are to configure these settings are:

  1. Change the increment values
    1. The increments specify how many units to add or subtract when resizing the given resource allocation for a VM. For example, it makes sense to change VMem by steps of 1024 MB at a time, but for VStorage it’s better to make changes by 0.5 GB steps.
    2. Other considerations: For VMem, you should not set the increment value to be lower than what is necessary for the VM to operate. If the VMem increment is too low, then it’s possible that Operations Manager would allocate insufficient VMem for the machine to operate. For a VM that is under utilized, Operations Manager will reduce VMem allocation by the increment amount, but it will not leave a VM with zero VMem. For example, if you set this to 512, then Operations Manager cannot reduce the VMem to less than 512 Kb.
    3. For VStorage, the default setting is very high to disable resize actions. This is usually preferred because VStorage resize requires that you reformat the storage. If you reduce this value then you will see resizing of storage.  NOTE: Turbonomic will get vStorage values per drive from VMware Tools.
  2. Change the Rate of Resize: When resizing resources for a VM, Operations Manager calculates the optimal values for VMem, VCPU and VStorage. But it does not necessarily make a change to that value in one action. Operations Manager uses the Rate of Resize setting to determine how to make the change in a single action, as follows:
    1. Low = Change the value by one increment, only. For example, if the resize action calls for increasing VMem, and the increment is set at 1024, Operations Manager increases VMem by 1024 MB.
    2. Medium = Change the value to be halfway between the current value, and the optimal value. For example, if the current VMem is 2 GB and the optimal VMem is 8 GB, then Operations Manager will raise VMem to 5 GB (or as close to that as the increment constant will allow).
    3. High = Change the value to be the optimal value. For example, if the current VMem is 2 GB and the optimal VMem is 8 GB, then Operations Manager will raise VMem to 8 GB (or as close to that as the increment constant will allow).
  3. Action settings: You can decide to change Resize actions to be Recommended, Manual, Automatic or Disable them altogether. You may configure these settings at any group level.  Review the settings in the Policy tab -> Actions -> VM.  By default all resizing actions are set to Manual.

 

Note: Introduced in version 5.0, we now auto-discover VM configuration settings with Hot Add for CPU and/or Memory, and then give the user more granular control to define the level of automation you want by sorting Resize Actions that are for Performance Risks from Efficiency ones.

 

The VMs that are auto-discovered are grouped into OOTB groups of “Virtual Machines by Hot Add CPU” and “Virtual Machines by Hot Add Memory”:

 

Additionally if you want to insure that for a group of VMs that you are protected to only have automated resizes that do not require a reboot (Hot Add going up, Reservations and Limits resizing) then by selecting “Enforce Non Disruptive Mode”, you can automate or allow manual actions, and only those that are non-disruptive will be allowed to execute.

 

Other Configurations: Resize to Template and Template Utilization / Consumption factors

Resizing of resources for a VM will by default look at VMem, VCPU and VStorage separately, but there is an option to resize to VM Templates.  If you select this option, all VMs will resize all their resources to the nearest VM Template without going over, unless the VMs are configured for reservations or limits.  For more information, review this Green Circle article Workload Rightsizing: Resize to Template and Template Utilization Factors  or Support KB article: https://support.vmturbo.com/hc/en-us/articles/200682116-How-to-configure-VM-resize-recommendations-to-use-Templates-and-how-to-modify-Consumption-Factors Note this is the default configuration when adding Cloudstack and OpenStack targets.

 

Viewing Resizing Recommendations

Turbonomic's analytics will continuously assess when a resizing action is required to assure performance and address a bottleneck.  These actions can be seen in the Assure Service Performance dashboard (select Show All if you do not see actions):

Actions that define efficiency opportunities are assessed once every 24 hours, and can be seen in the Improve Overall Efficiency Dashboard:

Both resizing for performance and efficiency can be seen in the Inventory Tab (both the Summary To-Do list, and when scoped to specific views), and in the VM Rightsizing Report.  You can also extract any action using the Turbonomic REST API for actionlogs – see this Green Circle article: https://greencircle.vmturbo.com/community/products/blog/2015/02/14/vmturbo-rest-api-series-part-3-ready-set-actions 

 

The VM Rightsizing report contains all your rightsizing actions in a single report.  The actions, exact values of what resource needs to be adjusted and why, as well as a view into the Peak vMem and Peak vCPU utilization values over the last 60 days of the Virtual Machine are reported. Turbonomic will also provide a costing factor to help you see at a glance which clusters will require investment or additional workload level capacity, and which Virtual Machines are savings where you can reclaim unused resources without impacting QoS.

 

Note: If you change the daily retention period to store more days of information beyond the 60 days default, the VM Rightsizing Report will report the peak over the past, up to 100 days, depending on the retention period, but increasing the retention period for data does not change the way the analysis is done.

 

Calculated values of cost of savings is controlled by Hardware Costs in the Policy tab -> Analysis section:

The VM Rightsizing report can also be saved in Excel format to be able to modify as you wish.

 

Turbonomic Data Collection

Turbonomic Software Defined Control is achieved through an asynchronous data collection methodology that is common to all types of actions prescribed to assure performance and simultaneously run your infrastructure as efficiently as possible. Rightsizing for virtual machines utilizes the common data collection mechanism. Turbonomic will then calculate our own averages and determine the peak values for platforms that provide Turbonomic raw data. Turbonomic will also collect data from immediately when Turbonomic is informed of a change, such as a notification of a virtual machine starting, a move, etc.  

 

The data is rolled up into the following periods: 15 minutes, hourly, daily, then monthly.  These data periods are all visible in the user interface.  Hourly, daily and monthly statistics are stored in the database. The Turbonomic Economic Scheduling Engine will continuously assess Risk; actions for Improving Overall Efficiency are evaluated every 24 hours.

Outcomes