A little back story: I spent the last decade working with a certain prodigious virtualization platform with all the certifications and accreditation’s that go along with it. During this time I thought I understood what it meant to run an efficient data center. After six months of working with a CMP (CloudForms) I now wonder what was I thinking. I have hit my head on each and every one of these cases and they are all preventable with the right solution. Remember, we live in the 21st century–shouldn’t the software that we use act like it?
- Ohh crap, we filled up a data store and all the machines on it stopped working.
This is all too common, and can happen for several reasons. Sometimes it is due to not realizing that a certain storage volume is filling up; other times it can be due to a single machine with 27 snapshots. In both cases we can set up a policy to prevent this from happening. In the first case, CloudForms can check for storage utilization and if it is over 90% take action, or better yet when it is within two weeks of being full based on trends. In the second, we can set up a policy to disable more than three snapshots. We all love to take snapshots, but there is a real cost to them, and there is no need to let them get out of hand.
- I just got 1,000+ e-mails telling me that my host is down.
The only thing worse than no e-mail alert is too many alerts. In CloudForms it is not only easy to set up alerts, but also define how often they should be acted upon. E.g. check every hour, but only notify once per day.
- Virtual Machines can not migrate due to attached VM tools updater CD-ROM image not un-mounting correctly. This breaks Disaster Recovery(DR) a can cause off balance loads and gnashing of teeth when attempting to put the server into maintenance mode.
A common shell script I have seen written over and over is to periodically unmount the virtual CDROM drives. These scripts usually work, run as root, and are both scary and indiscriminate. With CloudForms we can setup a simple policy that can unmount drives once a day, but only after sanity checking that it is the correct ISO we want to unmount. No longer do we need to fear the wrath of the DBA that had his CDROM drive yanked during a midnight database upgrade!
- I have to manually ensure that the 800 VM’s pass an incredibly detailed and painful compliance check (STIGS, PCI, FIPS, etc.) by next week!
I have literally lost weeks of my life to this and if you have not had the pleasure, count yourself lucky. When the “friendly” auditors show up with a stack of three-ring binders and a mandate to check everything once a year, you might as well clear your calendar for the next few weeks. In addition, since these checks are usually a prerequisite to operate, expect many of these meetings to involve layers of upper management you didn’t even know existed and definitely not the best time to become acquainted.
The good news is if you’re not already using OpenSCAP you owe yourself a look, but security compliance is about much more than just the VM. CloudForms allows for you to put in checks for hosts and providers as well. Not only that but if someone attempts to manually bring something online that is not compliant, CloudForms can shut it right back down. That’s the type of peace of mind that allows for sleep-filled nights.
- Someone logged into a production app server through the virtual console as root and broke it. Now you have to physically hunt down and interrogate all the potential culprits.
Before you pull out your foam bat and roam the halls to apply some “sense” to the person who did this, it is good to know exactly who it was and what they did. With CloudForms you can see a timeline of each machine and who logged into what console as well as perform a drift analysis to potentially see what changed. With this knowledge you can now not only work to fix the problem, but also to “educate” the responsible party.
- The developers insist that all VM’s must have 8 vCPU’s and 64GB of RAM.
The best way to fight flagrant waste or resources is with data. CloudForms provides the concept of “Right-Sizing” where it will watch VM’s operate and determine what resource allocation is an ideal size. With this information in hand CloudForms can either automatically adjust the allocations, or spit out a report to be used to show what these types of decisions are costing.
- Someone keeps creating 32bit VM’s with more than 4GB of RAM!
As we know there is no “good” way that a 32bit VM can possibly use that much memory and it is essentially just waste. A simple CloudForms policy of check for OS Type = 32bit and RAM > 4GB, can be a very interesting report to run, or better yet a policy to put in place and forget about.
- I have to buy hardware for next year, but my capacity-planning formula involves a spreadsheet and a dart board.
Long term planning in IT is hard and especially so with the advent of dynamic workloads in the cloud. Once CloudForms is running it is automatically collecting performance data and doing trend line analysis. It extrapolates this data to give a date when that trend line will hit zero. E.g. on this date in 23 days you will be out of storage on your production SAN. If that doesn’t get your attention nothing will. In addition it can perform simulations to see what your environment looks like if you add additional resources. E.g. What are my trend lines and capacity If I added another 100 VM’s of this type and size.
- For some reason two hosts were swapping VM’s back and forth, and I only found out when people were complaining about performance.
As an administrator there is no worse way to find that something is wrong than being told by a user or the consumer of that resource. Large scale issues such as this can be hard to see from the logs since they consist of normal output. Through the use of a timeline overview of the entire environment issues like this become apparent and the root cause can be tracked down.
- I spend most of my day pushing buttons and spinning up VM’s and manually grouping them into virtual folders.
We have all heard of DevOps, but the name makes it sound like something new or a radical concept. In reality it is nothing more than a continuation of the concept of automating everything! The difference is now that we have tools like a Cloud Management Platform this can be done not per server, but per environment. We should not just care about one layer of the stack, but how they all can be orchestrated together.