Capacity Planning for your Virtual Data Center and Cloud – Part 3
In Part 1 of this series, I spoke about the consequences of insufficient Capacity Planning while setting up a virtualization infrastructure. In Part 2, we looked at how to implement Capacity Management. This post concludes the three-part series.
Let’s now discuss the technology and tools that will help with Capacity Management in your Virtual Data Center and Cloud. We will focus on these three aspects – Monitoring Capacity, Managing Capacity and Optimizing Capacity.
Monitoring Capacity can be both reactive and proactive. It is reactive when an alert is triggered when an upper limit on the utilization threshold has been breached. The alert sets off some actions such as identifying root cause of the breach – which VM or Host is/are affected; which application or service is causing the breach; is it transient or permanent; etc. It is proactive when you are using historical data for analysis to forecast future capacity demands and utilization trends. Two of the common questions organizations ask me about Capacity monitoring are:
- How do I monitor?
- What do I monitor?
How do I monitor?
If you are running and managing your own virtual data center and private cloud, you need to monitor capacity of resources and service performance at the Resource Pool, ESX Cluster, Hypervisor, VM Guest and Application layers. If you are subscribing to cloud services managed by Service Providers, some of these layers (e.g. Cluster and Hypervisor) may be monitored by the Service Provider. I have encountered organizations which continue to employ traditional monitoring practices and tools used for monitoring physical infrastructure; to monitor their virtual servers. This will result in inaccurate and incomplete metrics as traditional methods rely on OS to monitor the server and is not aware of the virtualization layer; thinking that it has the whole host to itself. In addition, instead of setting fixed thresholds applied across servers which may not be the best way to monitor capacity and performance as not every application is the same; the next generation of monitoring tools with predictive analytics which learn an application’s normal behavior and detect when it deviates from its normal pattern will be a better choice.
What do I monitor?
Some of the key metrics you will monitor at the Resource Pool, Cluster, Hypervisor and VM layers include available, allocated and utilized CPU, memory, storage and network. For Applications, the key metrics would be number of transactions and response times. The main point to note is that it may be necessary to review metrics at the different layers to correctly diagnose a problem. For example, an application with slow response time may be due to hogging of resources by another VM in the same host (Cluster/ Hypervisor Layer); overly-committed memory (Resource Pool/ Cluster/ Hypervisor/ VM layer); contention of resources due to inappropriate priority or shares allocation (VM layer), etc.
Managing capacity ensures that there is always available capacity to meet service demands. To accomplish this, one has to make use of data from capacity monitoring, business forecast and project pipelines; and input these into a data model to predict how long remaining capacity will last and when it is necessary to acquire more capacity. You can implement a capacity data model using a spreadsheet to forecast capacity or make use of automated capacity planning solution to help with the simulation. For an organization subscribing to cloud services, this will also help minimize the contingency and buffer maintained by the organization, thus reducing cloud services costs.
Managing capacity also detects and minimizes wastage. This involves detection of VMs in idle or powered off states for a prolonged period; over-allocated or over-sized VMs; and VMs which have expired their lease. In a 3rd party cloud, the Service Provider will usually have a process to reclaim expired VMs automatically. Consumer organizations will need to monitor and identify idle, powered-off and oversized VMs which then triggers the appropriate procedures to reclaim the unused capacity to put them back into their resource pools.
Optimizing capacity aims to maximize the efficiency and utilization of available capacity without impacting service levels, through the implementation of automation and technology. Example of these automation and technology would include:
- Compute: over-commit CPU and Memory; and dynamic resource scheduler
- Storage: thin provisioning, automated storage tiering, storage profiles, storage I/O controller and federated storage resource pools
- Network: network I/O controller, converged network and WAN optimization
In a self-managed virtual data center and private cloud, optimizing capacity not only helps organizations to increase the efficiency of their assets but also reduces their cost of delivery and operations. For 3rd party cloud situation, the service provider deploys capacity optimization solutions to enhance their competitiveness as they drive down their unit cost of compute, storage and network resources.
In summary, capacity planning and management is a key practice in a virtual data center and cloud. Regardless you are running the virtual data center or cloud infrastructure yourself or consuming resources from cloud service providers, capacity management will help you achieve the goal of delivering IT services in the most efficient and cost effective way.