RSS
Indraneel Chowdhury

Cloud Computing - Large Scale Computing for Everyone

Fri, Nov 14, 2008

Indraneel Chowdhury

In the beginning of 2008, New York Times ingested 405,000 very large TIFF images, 3.3 million articles in SGML and 405,000 xml files mapping articles to rectangular regions in the TIFF’s using Amazon Web Services, Hadoop and some custom code. This data was converted to a more web-friendly 810,000 PNG images and 405,000 Javascript files containing JSON in less than 36 hours.

Why is this a big deal?

NASA has been computing on far greater scale than this for a long time. NASA’s weather simulation software is computationally a lot more complex than indexing documents.
So why is this a big deal? It’s a big deal because it was neither NASA nor CERN, not even Google. It was a business who did this without buying a single machine. They rented computing power on the fly. They rented slices of a cloud.

What is cloud computing?

A few days ago a journalist said “There is clear consensus that there is no consensus on what cloud computing is”. I like to think of cloud computing as the commercialization of computing resources like CPU cycles, storage, memory etc just like public utilities like electricity, water or natural gas. At the very core of the cloud is virtualization. Virtualization is a technique in which software is used to completely simulate or emulate hardware.

Types of clouds

I see two distinct categories of clouds that vendors are selling today:

  • Infrastructure as a Service - IaaS vendors sell raw compute power - CPU cycles, memory, bandwidth etc. IaaS clouds are complex but with the complexity comes flexibility. Most cloud vendors allow root access to an instance. And hence, specialized knowledge is necessary to handle such flexibility.
  • Platform as a service - PaaS refers to those clouds which provide frameworks and infrastructure on which users can build applications. PaaS clouds are built on IaaS clouds. Most PaaS clouds are very restrictive. They generally allow users to build applications on a particular set or sets of technologies. For example Google App Engine allows users to build applications using Python only. Portability is an inherent issue with PaaS clouds, because of the lack of standards in this domain. So if you have built an application using Google App Engine and BigTable you probably won’t be ableto port the data to any other cloud without spending a huge amount of time and money.

Inside the cloud

At a very high level clouds are made up of the following layers:

  1. At the very bottom is the hardware layer. Many cloud vendors build their clouds out of of the shelf server class software. For example Joyent uses Dell servers with quad core intel processors for their cloud. Plumbing refers to the networking elements in the cloud with all the fast router, switches and load balancers connected by fiber optic cabling. Clusters, made up of ordinary server class machines make up the skeleton of the cloud.
  2. Storage services refer the storage provided by the cloud. Most cloud vendors offer SAN or NAS storage. Provisioning is generally on the fly and users can ask for virtually unlimited amount of storage.
  3. As mentioned earlier, virtualization is at the very core of the cloud. Virtualization has made creation of a software machine as a clone of an existing one super fast. Think of the cluster (mentioned earlier) as one mega machine with one host OS managing all its resources. Creating virtual machines with pre-defined CPU and memory is fast and easy. Many vendors like Amazon Web Services use Xen virtualization.
  4. Platform services are bunch of pre-installed and packaged goodies that an user of the cloud gets whenever an instance of the cloud is brought up. The LAMP stack supported by AWS and Joyent is an example of platform services.
  5. No matter what the vendor says, if it takes more than 10-15 minutes to bring up an instance, then it is not a cloud. The web services layer is the one that enables users to templatize an instance, bring up a new instance from a template, take backups, restore from a backup etc. instantly, as and when needed.

Just in time deployment using the cloud

Deployment of products is messy business. Not so long ago, fledgling organizations had to first calculate the amount of computing resources needed for a launch, translate that into hardware requirements, call up the hardware vendors or the hosting company and wait till they provisioned the hardware and then installed and configured the software. Provisioning, installation and configuration took several days. It was a lose-lose scenario for everyone. Product success meant another cycle of calls and provisioning while the users suffered due to unresponsive software caused due to heavy load. Product failure meant huge losses due to unused hardware.
Not any more with the advent of the cloud. Now product launches can happen at the click of a button with just enough computing resources sitting behind a Virtual IP. The utilization of the resources are closely monitored. New, templatized instances of the cloud are instantiated whenever the threshold for the monitored utilization is reached. The users of the cloud pay for what they use at any instant of time. The users of the product never find it unresponsive, since computing resources are always adequate, just in time to meet the users’ needs.

Large scale computing for all

So long the ability to do large scale computing was within the reach of an elite club of businesses. Google, Amazon and Yahoo were amongst the very few in the club. Few businesses had the means to lay their hands on infrastructure of that scale. Cloud computing has changed that. Today, a ‘large’ Amazon EC2 instance with 4 EC2 compute units (which is equivalent to the capacity of 4 Opteron or Xeon processors) and 7.5 Gigs of memory costs as less as $288 per month. Users can choose from quite a few operating systems and scale up and down on the fly. Application development platforms like JBoss Enterprise Application Platform and Ruby on Rails come built into it. Clouds have opened the doors of large scale computing to virtually everyone.

Popularity: 9% [?]

1 Comments For This Post

  1. ecommerce web hosting Says:

    Interesting article. Thanks for information.

Leave a Reply