Virtualization vs Private Cloud

I originally wrote this in answer to a question posted on the LinkedIn CloudStack Users Group, but my verbosity pushed the reply way over the allowable length in the LinkedIn comment form field. Plus, with all the hype around cloud these days, there needs to be more real-world examples of where cloud fits and where it doesn't.

The question was this (with minor edits for spelling and grammar):

Private cloud vs virtualization

I searched the net for the difference between private cloud and virtualization. I found some articles, but none of them could illustrate this difference by a real example, they gave just the concepts: self service, ubiquitous, resource pooling, …

My question is: what is the add value of private cloud, especially with cloudstack, for an enterprise that already has a virtualized datacenter, especially with vmware?

Virtualization is a component of cloud, whether public or private. To better illustrate the difference, let's consider a couple of use cases and examples:

Example 1.
An IT department, continually installing/reinstalling new servers, implements a virtualization solution so they can provision infrastructure faster and consolidate servers. They virtualize their servers using their hypervisor of choice along with management tools. They upload iso files into their management software so they can install new OSes into a new virtual machine. They have to be on the local network in order to manage the virtual machines or orchestration software. And if they are charging back capacity to their internal customers' budgets (Marketing, Sales, Engineering, etc), they're probably just splitting the cost between each group, or maybe evaluating how many virtual machines they stand up for each group.

Is this cloud? Not really. This is really just server consolidation, data center automation, etc. Their solution doesn't really meet all five characteristics of cloud computing:

  1. On-demand self-service (they still have to provision virtual machines for their internal customers).
  2. Broad, network access (this is for internal customers on the network)
  3. Resource pooling (this is where virtualization fits, so yes)
  4. Rapid elasticity (they still have to provision VMs, and they don't necessarily scale fast)
  5. Measured service (they're charging back to their users based on traditional budgeting, not based on actual usage)

Example 2.
A company has a headquarters office with a central IT staff that supports company-wide and departmental applications. They also have several branch offices with local IT staffs that focuses on break/fix repair of local desktops and network services. The branch offices may occasionally set up a local server and install some software at a manager's request, but they usually request central IT to provide supported servers or applications at HQ. Central IT is looking to provide better support for their branch offices without having to hire more staff, provide faster turnaround time when provisioning services for supported apps, and even allowing quick, easy servers on-demand to their branch offices for local, unsupported applications. So they install their hypervisor of choice, deploy storage in their preferred manner, and add some management software. However, in additional to providing ISO files for VM installation, they also prepare some disk images with pre-installed, supported OSes. Additionally, the management software provides for multiple users of different access levels to perform tasks such as launching virtual machines. Now, the Marketing department in a branch office can try out some new analytics software by logging into a portal, provisioning a new server, installing the trial software, and using it for a few days. If they don't like it, they turn it off and delete the VM. Engineering may deploy multiple VMs to set up a production application, but also spin up a few additional VMs to use as development and staging environments. They no longer have to put in capital requests for servers, nor do they have to search around some old supply closets and pull out some old dusty desktop system to use as a local staging server.

Is this a private cloud? Yes!

This company still using virtualization, but now they've added a level of self-service for branch offices (whether it's local IT using it or someone else) to consume services without necessarily requesting the limited resources of central IT. They can access this service from their branch office, possibly using a VPN connection over the Internet or an SSL/TLS web-based portal (broad network access). They can spin up additional capacity quickly and turn it off just as fast (rapid elasticity). And because of all this, central IT can now meter actual usage of each service by various departments on a monthly or even hourly basis and charge departments accordingly.

In Example 1 above, the IT department might deploy VMware ESXi and use its built-in management tools, or perhaps deploy Red Hat with KVM and virt-manager. Those are tools available to the server administrators to manage the virtual machines themselves. In Example 2, CloudStack would sit on top of those hypervisors and management tools to provide everything beyond basic virtualization: user self-service portals, prebuilt disk images, usage metering, access from branch offices, etc.

That's the difference between virtualization and private clouds.

Bandwidth, Transfer, and Arithmetic

Every once in a while, I need to explain bandwidth throughput and data transfer to someone. Many web hosts use transfer (GB per month) to measure server bandwidth usage. Other hosting companies, especially those offering colocation services, will tend to offer “95th percentile”, which is a measure of throughput (Mbits per second).

Transfer is the total amount of data sent (either incoming, outgoing, or both) in a given month. Throughput is the amount you're sending through your network pipe on a regular basis, usually by taking a sample every five minutes and averaging it over the course of the month.

Or to put it another way, transfer is when the water company bills you at the end of the month for how much you used (43,000 gallons of water between June 1st and June 30th). Throughput would be more like averaging out how much your pipe is pushing through (1 gallon per minute).

Ultimately, both measurements are the amount of data sent over time, so we can get a rough estimation of how much transfer (GB per month) can be transferred over a single committed connection speed.

Assumption:
Our hypothetical server is pushing data at a constant data rate of 1 Mbit per second, with no bursting (ie, we're not using 95th percentile measurement).

Starting with the following information:

8 bits equal 1 byte
2,592,000 million seconds equals 1 month

Step 1. Convert Megabits to Megabytes

Divide 1 Mbit by 8 to get Megabytes
1/8 MByte per second = 0.125 MB per second

Step 2. Change Seconds to a fractional Month (1 month = 30 days * 24 hours * 60 minutes * 60 seconds = 2,592,000 million seconds)

We now have:
1 Mbit per second =
0.125 MB per 1/2592000 month
This is just restating the measurements.

Step 3. To change the values to a full month, multiply *both* sides by 2,592,000 (we want to change the values, rather than the measurements, so we need to operate on both sides of the equation).

(0.125MB * 2,592,000) per (1/259200 * 2592000) =
324,000 MB per 1 Month

Step 3. Change MB to GB, divide by 1024 on the left:
324,000 MB / 1024 = 316GB

We now have 316GB per month of data transferred if someone sustains a constant throughput of 1Mbit/sec.

Rounding down to a nice 300GB per month, we can figure that a 10Mbit connection speed for your server will yield a maximum of 3000GB (~3TB) of transfer per month (300GB/month X 10Mbits/sec, or 300×10). A 100 Mbit uplink can yield up to 30,000GB (~30TB) of transfer if sustained at full speed.

Bursting patterns and 95th percentile measurements (where the service provider cuts out the top 5% of bursting traffic and averages only on the remaining 95%) will cause this to fluctuate a bit. After all, if you have a 100 Mbit port speed, you can burst up to a 100Mbits of throughput for a period of time, then come back down to normal. So you may send a 300MB file in about 25 seconds if you're sending at a full 100 Mbits / sec (100Mbits / 8 bits = 12.5MB/sec; 300MB / 12.5 MB/sec = 25 seconds). But you won't likely sustain that 100 Mbits after the file is sent. You'll drop back down to a slower throughput until he next large amount of data needs to be sent.

Additionally, that same 300MB file will take about 240 seconds to send at 10 Mbits/sec, and with a 1 Gbit uplink, you could potentially send it in as little as 3 second (traffic speed between routers, datacenters, and server disk read/write and processor speeds not included).

Repurposing Pallets

Years ago, I discovered the benefits of reusing pallets when I managed a [now-defunct] I Sold It franchise (I Sold It was a chain of eBay drop-off stores). We had a large item to ship that required a pallet for freight shipping, and I didn't want to spend $100 on pallets for a single use ($20 per pallet, quantities of 5 or more). A short search around the rear of our building yielded a couple of pallets, one of which was usable.

When I realized how many pallets are discarded every day around the Atlanta area, I initially thought it might be a good side business to gather up discarded pallets and sell them at a discount. Unfortunately, the economics of dumpster diving for pallets was not very profitable: a business owner is not likely to pay regular price for a used pallet, and to sell on any large scale involved a large commitment of time and resources – hours of driving and gasoline for the car to find and gather them. Though I had space to store them, gathering them up until I had enough to sell wasn't really an option. So my fantasy of a recycling pallet business dissolved as soon as I thought of it.


Adirondack Chair: Photo blatantly copied from Shelton's Flickr page, but used in a fair-use reporting context. If you click on the image to view all his photos, maybe he won't send me a DMCA takedown notice.Adirondack Chair: Photo blatantly copied from Shelton's Flickr page, but used in a fair-use reporting context. If you click on the image to view all his photos, maybe he won't send me a DMCA takedown notice.I had a pleasant surprise at Indie Craft Experience this past weekend. While browsing the usual selection of vendors selling knitted hats, recycled felt skirts, and other crafty stuff, I came across a guy sitting in a wooden chair that looked suspiciously like it was made from pallets.

I have a tendency to stop at any booth featuring wooden furniture items (especially at art fairs, where handmade woodworkers display beautiful and amazing furniture). So there was no hesitation to stop and look at this interesting piece of furniture.

Shelton Davis (the guy sitting in the chair) was there representing Repurposed Goods, a project to reuse discarded pallets into new creations. The chair was made completely from recycled pallets, and he also had on display a birdfeeder. He was selling complete DIY building plans for the chair and bird feeder, as well as a DIY “Ikea-style” kit for the bird feeder that one could build immediately.

All the plans were printed on recycled paper, and he also sells the plans as downloadable PDFs through his Etsy store. It turns out Shelton is an industrial designer, so not only is he able to create plans for pallet repurposing, he is able to convey the instructions in a simple and clear document.

I have his birdhouse instructions, and plan to purchase his Adirondack chair from his Etsy store once I have some time to devote to a bigger project.

I think I have found a reason to start stacking up pallets again.

A Lesson in Storage Costs

As the [former] Customer Service Manager at A Small Orange, comments, questions, and even complaints were frequently brought to my attention.

One common question is “why do you offer so little disk space compared to your competitors?” Or to phrase it another way, “why are your shared hosting plans so stingy?”

This is an understandable question. From the customer’s viewpoint, they can (as of 2010) buy a 1TB disk drive from Newegg.com anywhere from $59 to $89. “Drives are cheap!”, they cry. “Add more drives! I’ve got 3TB in my desktop PC, why can’t you do it on my server?”

Servers don’t quite operate this way.

For one thing, the hard drive in a server is expected to be reading and writing data almost constantly 24×7, instead of spinning up occasionally for downloads, torrents and gaming. That constant disk reading/writing will shorten the usable life of the drive. When a server’s hard drive dies, it’s not just one customer’s data that is lost…it can be several hundred customers’ data. So a server needs some failover, or redundancy. For a low-cost shared server, that might mean two drives in a mirrored (RAID-1) configuration so that one drive fails, the other one keeps going with a copy of the data until the first one can be replaced. A higher-end server might even have four or six drives, with some combination of mirroring or data protection (RAID 0+1 or RAID 5, for example) so that one or more drives can fail without the losing valuable customer data.

With this redundancy alone, we’ve gone from doubling the cost (two $90 drives for $180) to sextupling the cost (six drives at $534).

Now consider the performance of the drive. The cheapest $59 drive on Newegg would be be a “green” drive. It is a low power, low speed (5400RPM) drive, and has a very low cost per-Gigabyte. I have a couple of these drives in my home storage box and they run just fine. But I don’t access my data at home in the same way a web server will. There is an $89 drive on Newegg that is a little bit more powerful – mostly faster (7200RPM). It’s great for a PC desktop or home storage unit, and maybe even a low-traffic server.

But when you have constant usage, that drive needs to access server data fast and go to the next request because the slower the hard drive is, the fewer browser hits those websites can handle. The standard SATA drive has a 7200RPM rotation speed, and that’s pretty good for desktop usage and low-end servers. For higher performance, SCSI and the newer SAS drives are still the best. A 300GB SAS drive currently runs about $279 and they’re *fast*. They spin at 10,000-15,000 RPMs, and the SCSI protocol they use offers superior performance in server environments.

So now, instead of $180 for 1TB of storage (two SATA drives, mirrored), we now have higher performance drives giving us 300GB of mirrored storage for $558. That’s 300GB of fail-safe, high-performance server storage for $558 versus a single $89 1TB drive in your PC. Three times the cost, but twice the speed and performance.

For even better performance, using a hardware RAID controller instead of software RAID is important. This hardware is at least $200, so now you get 300GB for $758 or more.

Apples and Oranges indeed!

This should provide some perspective on the differences between consumer and business server storage. The next logical question is, how do other hosting companies provide so much more storage, like 50GB or more per customer?

First, many hosting companies use a control panel software (such as cPanel) on their servers. This software is used on a single server, with a limited number of hard drives (usually 2, 4, or 6) and turns it into a profit-machine. You have a set number of customers per server, and when you run out of space, you buy a new server with a limited number of drives and start filling it up with customers.

Using A Small Orange’s Medium hosting plan as an example, a single 1TB drive would allow up to 600+ customers to (1000GB divided by 1.5GB of space per customer). Of course, A Small Orange is not using single desktop hard drives, but mirrored high-performance drives. The drives actually are 146GB SAS drives, so about 97 customers could share that space if they all had the Medium plan.

The more customers a hosting company can stuff onto a single server, the more money you make. So many hosting companies use a technique called overselling. This means the hosting company will offer more space per customer (say, 10GB or 50GB or even unlimited storage) knowing that the most customers will never use it all. If every customer on that server filled up their 10GB or 50GB of space, the server drive space will fill up very quickly and cause problems. To prevent this from happening, there are clauses in the Terms and Conditions that define other limits, usually number of inodes. This prevents customers from filling up the disk with a large number of small files. Additionally, there are usually “fair usage” clauses with “unlimited storage” plans to allow the hosting company to suspend service if the customer is using more than a fair share of the overall disk amount.

With the price of Storage Area Networks and Network Attached Storage devices becoming more accessible to smaller operations, some hosting companies are moving away from fixed storage like physical disks. By implementing a high-performance network storage, a given server can increase it’s storage capacity on demand to meet the increasing storage demands of its customers. As this becomes more common, larger storage quotas (*without* overselling) will eventually become more commonplace.

Peripheral Vision

When driving in Atlanta, it is safe to assume that the person operating the vehicle in the lane next to you is visually handicapped and has absolutely no peripheral vision. Furthermore, it is also safe to assume that this person will completely avoid compensating for this handicap…such as using side-view mirrors or looking left/right before changing lanes.