Saturday, October 10, 2009

Cloud'ed Thoughts

These were some of the questions posed to me during the cloud computing panel discussion at CSI Annual Convention 2009

Each one of you has a different view (PaaS, services, testing, startup, management) in the domain. A 5-minute warmer on your take on cloud computing based on your current work will be great. This will set the stage nicely for the discussion.
There are many “definitions” of cloud computing but for me “Cloud Computing is the fifth generation of computing after Mainframe, Personal Computer, Client-Server and the Web.” Its not often that we have a whole new platform and delivery model to create businesses on. And what's more its a new business model as well – using a 1000 servers for 1 hour costs the same as using 1 server for 1000 hours – no upfront costs, completely pay as you go!
How has cloud computing suddenly creeped on us and become technologically and economically viable? Because of 3 reasons:
  1. Use of commodity hardware and increased software complexity to manage redundancy on such hardware. The perfect example of such softwares is virtualisation, MapReduce, Google File System, Amazon's Dynamo, etc.
  2. Economies of scale. In a medium sized data center it costs $2.2 /GB/month while in a large data center it costs $0.40/GB/month. That is a cost saving of 5.7 times which cloud computing vendors have been possible to pass on to the customers. In general, cloud infrastructure players can avail 5 to 7 times decrease in cost.
  3. The third and according to me the most important reason: there was a need to scale for many organizations but not the ability to scale: As the world became data intensive, players realized that unless scalable computing, scalable storage and scalable software was available, their business models won't scale. Consider analytics as an example. Some years back it was possible for mid-sized companies to mine the data in their own data center but with data doubling every year they have been unable to keep up. They have decided to scale out to the cloud. Amazon, Google realized this from their own needs very early and look here we are eating their dog-food!
Developers with new ideas for innovative internet services no longer require large capital investments in hardware to deploy their service. They can potentially go from 1 customer to 100k customers in a matter of days. Over-provisioning or under-provisioning is no longer a factor if your product is hosted on cloud computing platforms. This enables small companies to focus on their core competency rather than worrying about infrastructure. This enables a much quicker go-to-market strategy.
Another advantage is that clouds are available in various forms:
  • Amazon EC2 is as good as a physical machine and you can control the entire software stack.
  • Google AppEngine and salesforce.com are platforms which are highly restrictive but good for quick development and allows the scaling complexity to be handled by the platform itself.
  • Microsoft Azure is at an intermediate point between the above two.
So depending on your needs, you can choose the right cloud!
As I said earlier its a new development environment and there is lot of scope for innovation which is what my company “Clogeny” is focusing on.
Cloud computing is not just about “compute” – it is also storage, content distribution and a new way of visualizing and using unlimited storage. How has storage progressed from multi-million dollar arrays and tapes to S3 and Azure and Google Apps?
I remember that when I started writing filesystems I needed to check for an error indicating that the filesystem was full. It just struck me that I have no need for such error checking when using cloud storage. So yes, its actually possible to have potentially infinite storage.
Storage: Storage arrays have grown in capacity and complexity over the years to satisfy the ever-increasing demand for size and speed. But cloud storage is pretty solid as well. Amazon, Microsoft and most other cloud vendors keep 3 copies of data and atleast 1 copy is kept at a separate geographical location. When you factor this into the costs, cloud storage is pretty cheap. Having said that, cloud storage is not going to replace local storage, fast and expensive arrays will still be needed for IOPS and latency hungry applications. But the market for such arrays may taper off.
Content Distribution: A content delivery network is a system of nodes in multiple locations which co-operate to satisfy requests for content efficiently. These nodes move the content around to serve it optimally where the node nearest to the user, serves the request. All the cloud providers offer content distribution services thereby improving reach and performance since requests can be served around the world from the nearest available server. This makes the distribution extremely scalable and cost efficient. The fun part is that the integration between cloud and CDN is seamless and can be done through simple APIs.
Visualizing storage: Storage models for the cloud have undergone a change as compared to the POSIX model and relational databases that we are used to. The POSIX model has given way to a more scalable flat key-value store in which a “bucket-name, object-name” tuple points to a piece of data. There is no concept of folder and files that we are used to. Note that for ease of use a folder-file hierarchy can be emulated. Amazon provides SimpleDB, a non-traditional database which is again easier to scale but your data organization and modeling will need to change when migrating to SimpleDB. MapReduce is a framework to operate on very large data sets in highly parallel environments. MapReduce can work on structured or unstructured data.
Consider this as an example, there is a online photo sharing company called SmugMug which estimates that it has saved $500,000 in storage expenditures and cut its disk storage array costs in half by using Amazon S3.

CC breaks the traditional models of scalability and infrastructure investment, especially for startups. A 1-person startup can easily compare with an IBM or Google on infrastructure availability if the revenue model is in place. What are the implications and an example of how?
Definitely, startups need to only focus on their revenue model and implementing their differentiators. The infrastructure, management and scaling are inherently available in a pay as you go manner so that ups and downs in traffic can be sustained. For examples, some sites get hit by very high traffic in first few weeks and need high infrastructure costs to service this traffic. But then the load tapers off and infrastructure lies unused. This is where the pay as you go model works very well. So yes, cloud computing is a leveller fostering many start-ups.
Also many businesses are using cloud computing for scale-out whereby their in-house data center is enough to handle certain amount of load but when load goes beyond a certain point they avail the cloud. Such hybrid computing is sometimes more economically viable.
Xignite employs Amazon EC2 and S3 to deliver financial market data to enterprise applications, portals, and websites for clients such as Forbes, Citi and Starbucks. This data needs to be delivered in real-time and needs rapid scale up and scale down.
What do you see when you gaze in the crystal bowl? 
Security is a concern for many customers but consider that the most paranoid customer – the US government has started a cloud computing initiative called “App.gov” where they are providing SaaS applications for federal use. Even if there are some issues, they are being surmounted as we speak. Cloud computing has now reached a critical mass and the ecosystem will continue to grow.
In terms of technology, I believe that there will be some application software running on-premise and another piece running on the cloud for scaling out. The client part can provide service in case of disconnected operations and importantly can help to resolve latency issues. Most cloud computing applications will have in-built billing systems that will either be a standard or software that both the vendor and customer trust. I would love to see some standards emerging in this space since that will help to accelerate acceptance.
Over the long term, absent of other barriers, economics always wins!” and the economics of cloud computing are too strong to be ignored.

A "Cloudy" day at CSI Annual Convention 2009


I had a very interesting opportunity to be one of the speakers on the panel discussion on cloud computing at CSI Annual Convention 2009. As it turned out the entire day was "cloudy" with most topics and discussions being centered around cloud computing. Most people agreed that cloud is the next generation of computing but there are still doubts as to which form of cloud computing will take off. The conclusion is that there IS a lot of hype and when that has died down, the products and companies who solve real problems will survive. People who try to monetize the medium instead of the product, might end up failing. Here are some of the excerpts from the day.

The day started with a keynote address on "Cloud Computing - Challenges and Opportunities" by Girish Venkatachaliah from IBM. His take was that about 20% of IT will move to the cloud in next few years and currently its more hype than substance.

Dr. Srikanth Sunderrajan from Persistent gave a great talk on Google AppEngine, a Platform-as-a-Service offering. His company recently implemented a product on top of Google AppEngine. His take was that AppEngine lacks many features and is a strait-jacket environment with almost no flexibility. They had to write complex libraries to enable file-system like storage and ended up using Amazon EC2 to aid the short-comings of AppEngine. His take was that Google needs to open up the platform and be more like Amazon's cloud offerings. One good thing about AppEngine is that development and deployment is fast and easy.

The panel discussion on cloud computing included Monish Darda from Websym, Karan Gujral from BMC , Gireendra Kasmalkar from SQS, Vikram Rajkondwar from Microsoft, Samir Bodas from ICERTIS and yours truly. The discussions covered PaaS, IaaS, SaaS, testing for the cloud, how can startups leverage the cloud, managing the clouds and much more. Vikram's views which stemmed from his experience working on Microsoft Azure were extremely insightful.

Here are some of the take-away points from the discussion:
  • The cloud phenomenon has been seeded due to the economies of scale. The cloud infrastructure providers use commodity hardware and use complex software to manage redundancy. The savings are passed on to the consumer making the cloud a very cost effective platform.
  • Evolution of virtualization technologies has enabled cloud data centers to increase efficiency. All parts of the stack will be virtualized as we progress.
  • Storage is an important aspect of the cloud. 3 copies of data are maintained by the cloud vendors so in terms of reliability to cost ratio, cloud storage is on par or cheaper than local storage. And unlimited storage is available on a completely pat as you go model.
  • Cloud is very interesting medium for testing and QE since these phases are needed late in the SDLC and require investment in terms of hardware and provisioning. Clouds make it possible to do functional and scale testing without upfront investment.
  • The most compelling use of cloud computing is when load and usage cannot be predicted. Cloud can be used to augment local data center - for scaling out when load exceeds certain levels. Such hybrid clouds will be the future of data centers. Another prime usecase is when loads are periodic - in-case of on-premise data centers this leads to low utilization and hence lesser ROI. Clouds can be provisioned as needed improving the ROI for such companies.
  • Today even a 1-person startup can compete with Google and IBM in terms of infrastructure. If a good revenue model is in place, then startups can use the pay as you go model to their advantage. Companies like SmugMug, ElephantDrive has done just this to keep up with their phenomenal growth. Without clouds, their growth would have stymied as they would not have had scale out capability.
  • The data center management companies will need to upgrade their products to manage the clouds. They will have to look at provisioning, job scheduling, profiling for the cloud along with the on-premise data center.
  • Everyone agreed that on-premise data centers will never be replaced by the cloud. They will be augmented. A lot of web hosting will move to the cloud though.
The conclusion was that companies and consumers should try to look through the hype and try to identify solutions that actually solve their problems. Every little software when provided as Software-as-a-Service does not become a better solution. If you find your sweet spot in the cloud, you are poised for phenomenal growth.