Datastore Browsing and Wasted Files

Blog Post created by k.thacker on Mar 2, 2015

Should you consider enabling Datastore Browsing? Before you do, consider the following:


Datastore Browsing


Datastore browsing only works with VMware (i.e. vCenter Server & ESXi.)


Datastore browsing is configurable per datastore (or group of datastores.) If you want to selectively enable datastore browsing for a group of datastores, change the global setting to ENABLED and disable it selectively per group(s) of datastores or create a custom group of  datastores and enable datastore browsing for that group.


Wasted Files


Wasted Files are identified when Turbonomic finds a file stored on a datastore that is not in the Files In Use list for any virtual machine for that vCenter Server instance. The age of a file is not considered when determining whether a file is wasted or not. The files discovered by datastore browsing (i.e. wasted files) are not used as an input metric to any action or recommendation.


If a file or directory matches a regular expression from the Files to ignore or Directories to ignore (in the Policy tab) then Turbonomic will not mark those files or any files in those directories as wasted. Because of the way this works, if a datastore is mounted to multiple vCenter Servers, Turbonomic may end up marking files on the datastore as wasted when they are actually in use. This use case is not supported.


When Does Datastore Browsing Occur?


Datastore browsing begins along with monitoring, at the end of the first discovery after the server was started (or restarted). There are 3 working threads per vCenter Server used exclusively for datastore browsing. At the end of the first discovery, Turbonomic queues all datastores for browsing, and services these queues with these threads. If an error occurs during datastore browsing, it will be placed on the queue again at the next polling cycle.


After a datastore has been successfully browsed, datastore browsing will again occur under the following circumstances:

  • The next time Turbonomic monitors that the Storage Amount Used value has changed by an amount of 250MB or greater (or by more than 0.1% of the total capacity of the datastore, if that's smaller than 250MB).
  • If no browsing occurred in the last 24 hours.


Edge Cases


If you add a new pattern under Files to ignore or Directories to ignore, Turbonomic does not immediately remove these files from the GUI, since they were already marked as (potentially) wasted. The next time datastore browsing occurs (either on schedule, or forced - see below) for that datastore, that is when Turbonomic will update the list by ignoring the files during the next pass through of the VMFS.


Datastore browsing is actually performed by an ESXi host attached to the datastore. Turbonomic will only browse one datastore at a time through each individual ESXi host. For example, if VMFS-B is to be browsed, and it is attached to only one ESXi host that is currently busy browsing VMFS-A, Turbonomic will wait until that browse completes and then initiate browsing of VMFS-B.


Additional Details


Turbonomic browses the datastore directory by directory. Turbonomic sends a request to get a list of all files and directories in the root directory, and waits for a response. Turbonomic repeats that, recursively, per sub-directory. You may observe that in the log (if you are logging at DEBUG or TRACE level.)


Turbonomic does not browse directories greather than 5 levels deep (e.g. <root>/A/B/C/D will be browsed, but sub-directories thereof will not be browsed). This is not configurable.


The default re-browse period is 86400 seconds (i.e. 24 hours) and is configurable by setting the dsBrowserRepollPeriod attribute of the MonitoringManager (which requires assistance from Turbonomic support.) If a directory contains too many files (>20,000 files and/or directories) browsing might fail.


     Note: VMFS-5 supports ~100,000 files and VMFS-3 supports ~30,000.


Browsing NFS storage is CPU-intensive on the host that performs the browse. Therefore Turbonomic is limited to one browse per host.