I would like to be able to monitor our hosts and storage arrays for failing drives or controllers that could cause machine(s) to die.
Example: Recently we were having issues with a couple of our virtual servers. Things running slow, log folder full and errors attempting to move servers. Contacted Dell for our Equalogic Array as it seemed to be coming from the array. We did have a bad drive within the array. When attempting to troubleshoot issue, we found that one of our hosts were not logging properly and was missing a lot of info. We cleared up the temp folder since it was full and restarted the host to make sure the logging would start properly. The host did not come back up. We found we had a failing OS drive that became too corrupted to use and the mirrored drive was missing a file so would not boot. We lost access to our Virtual Center and access to a couple of other servers that were on the host that failed. Everything is back up now, but was a major headache for a few days.
Monitoring the controller and drives could have allowed me to find the issues before they became such a problem. The Dell Diagnostics ran on the Equalogic showed one of the physical drives getting many bad blocks. A sign of a drive failing.