Monday, January 25, 2010

Zero Copy - Mapping buffer into pages for disk IO

I needed to implement zero-copy for a block device driver. It turned out that a lot of IO in the driver was happening through buffers and earlier each IO involved page allocations and copying of data from page to buffer. This naturally ate up lot of CPU and needed improvements.

While implementing I did not find good example code in Linux kernel, due to which I ended up wasting some time in investigations. Some issues to consider:

1) Don't assume that all memory was kmalloc'ed. Check using is_vmalloc_addr() what type of memory it is.
2) On some architectures, even kmalloc allocations will cross page boundaries

void buffer_disk_io(struct my_req * req)
{
        int is_vmalloc;
        int count;
        unsigned int len;
        struct bio * bio;
        struct page * pg;
        unsigned int offset;
        unsigned int sector;
        void *addr;

        count = (req->num_sectors-1) / (PAGE_SIZE / SECTOR_SIZE) + 1;

        /* The buffer may not start from page boundary in some cases
         * but it can cross page boundaries */
        offset = offset_in_page(req->buffer);
        if (offset && (offset + (req->num_sectors << 9)) > PAGE_SIZE)
                count++;

        bio = bio_alloc(GFP_NOIO, count);
        if (bio == NULL) {
                req->status.syserr = -ENOMEM;
                req->status.code = DS_ERR_UNKNOWN;
                return;
        }

        bio->bi_bdev = req->path->bdev;
        bio->bi_sector = req->start_sector;

        bio->bi_private = req;
        bio->bi_end_io = __end_io_indirect;


        /* Check if memory is vmalloc'ed or kmalloc'ed */
        is_vmalloc = is_vmalloc_addr(req->buffer);

        sector = 0;
        while (sector < req->num_sectors) {
                addr = (req->buffer + (sector << 9));

                if (is_vmalloc) {
                        pg = vmalloc_to_page(addr);
                } else {
                        pg = virt_to_page(addr);
                        get_page(pg);
                }

                offset = offset_in_page(addr);

                /* Consider case when offset in not on page boundary and it may
                 * or may not cross page boundaries */
                if ((req->num_sectors - sector) >= (PAGE_SIZE / DS_SECTOR_SIZE))
                        len = PAGE_SIZE - offset;
                else if (offset + ((req->num_sectors - sector) << 9) < PAGE_SIZE)
                        len = (req->num_sectors - sector) << 9;
                else
                        len = PAGE_SIZE - offset;

                if (!bio_add_page(bio, pg, len, offset))
                        goto failed;

                sector += (len >> 9);
        }

        /* set command */
        if (req->cmd == IO_READ) {
                bio->bi_rw = READ;
        } else if(req->cmd == IO_WRITE) {
                bio->bi_rw = WRITE;
        }
        bio->bi_rw |= (1UL<

        generic_make_request(bio);

        return;


failed:
        if(bio != NULL)
                __end_io_indirect(bio, 0, -ENOMEM);

        return;
}

Monday, January 18, 2010

Performance engineering with OProfile

I have been working on a Linux kernel replication product and have been wanting to profile the kernel module for evaluating performance. I used OProfile as the tool for extracting the profiling information.

1. Install kernel-debuginfo packages or compile a kernel since we need vmlinux

2. Since I was working with VMWare, I needed to load oprofile module and use timer interrupts
               modprobe oprofile timer=1
This took a little bit of time to figure out. If this step is not done, then you would not see any logs even though you would not see any errors.

3. Provide oprofile with the path to your vmlinux file
               opcontrol --vmlinux=/usr/lib/debug/lib//vmlinux

4. Reset the data from oprofile is needed with
               opcontrol --reset

5. Run whatever operations you want to profile
               opcontrol --start; ; opcontrol --stop

6. Dump the logs that were generated
               opcontrol --dump

7. View the logs
              opreport -c --demangle=smart --image-path= --merge tgid | less

The logs can be viewed in various formats and many types of information can be extracted. You can see the command for that here.

Here is a snippet of the output which tells us that performance bottleneck is at lock_conflict() so we can look at the logic in and around this function to improve performance.

samples  %        samples  %        image name       app name      symbol name
-------------------------------------------------------------------------------
40503    41.7815  19369    19.9559  vmlinux        vmlinux        prepare_to_copy
  40503    100.000  19369    100.000  vmlinux       vmlinux       prepare_to_copy [self]
-------------------------------------------------------------------------------
19951    20.5808  6399      6.5929  foo.ko            foo             lock_conflict
  19951    100.000  6399     100.000  foo.ko         foo             lock_conflict [self]
-------------------------------------------------------------------------------
7056      7.2787  25345    26.1130  pcnet32          pcnet32     (no symbols)
  7056     100.000  25345    100.000  pcnet32       pcnet32     (no symbols) [self]
-------------------------------------------------------------------------------
6358      6.5587  1824      1.8793  foo.ko              foo             locks_overlapped
  6358     100.000  1824     100.000  foo.ko          foo             locks_overlapped [self]
-------------------------------------------------------------------------------
6072      6.2637  12217    12.5872  foo.ko            foo             fsnlock_release_1
  6072     100.000  12217    100.000  foo.ko         foo             fsnlock_release_1 [self]
-------------------------------------------------------------------------------
2844      2.9338  4098      4.2222  foo.ko              foo             init_lockmgr
  2844     100.000  4098     100.000  foo.ko          foo             init_lockmgr [self]
-------------------------------------------------------------------------------
792       0.8170  831       0.8562  foo.ko                foo             block_write_umap
  792      100.000  831      100.000  foo.ko            foo             block_write_umap [self]
-------------------------------------------------------------------------------
733       0.7561  626       0.6450  foo.ko                foo            fsnlock_release
  733      100.000  626      100.000  foo.ko            foo            fsnlock_release [self]
-------------------------------------------------------------------------------
671       0.6922  129       0.1329  dt                      dt               fill_buffer
  671      100.000  129      100.000  dt                  dt               fill_buffer [self]
-------------------------------------------------------------------------------
428       0.4415  173       0.1782  vmlinux              vmlinux      __make_request
  428      100.000  173      100.000  vmlinux          vmlinux      __make_request [self]
-------------------------------------------------------------------------------
401       0.4137  412       0.4245  foo.ko               foo            ReadConfigurationFromCfg
  401      100.000  412      100.000  foo.ko           foo            ReadConfigurationFromCfg [self]

Saturday, January 16, 2010

Connected?

I found the following an interesting read from Howard Mann :
There are tens of thousands of businesses making many millions a year in profits that still haven’t ever heard of twitter, blogs or facebook. Are they all wrong? Have they missed out or is the joke really on us? They do business through personal relationships, by delivering great customer service and it’s working for them. They’re more successful than most of those businesses who spend hours pontificating about how others lose out by missing social media and the latest wave. And yet they’re doing business. Great business. Not writing about it. Doing it.
I’m continually amazed by the number of people on Twitter and on blogs, and the growth of people (and brands) on facebook. But I’m also amazed by how so many of us are spending our time. The echo chamber we’re building is getting larger and louder.

More megaphones don’t equal a better dialogue. We’ve become slaves to our mobile devices and the glow of our screens. It used to be much more simple and, somewhere, simple turned into slow.
We walk the streets with our heads down staring into 3-inch screens while the world whisks by doing the same. And yet we’re convinced we are more connected to each other than ever before. Multi-tasking has become a badge of honor. I want to know why.
I don’t have all the answers to these questions but I find myself thinking about them more and more. In between tweets, blog posts and facebook updates.
 I tend to agree with him, especially the first paragraph. Twitter, Facebook and rest of the social media are certainly redefining virtual relationships and helping to create personal relationships. So while social media is a very important medium to reach more people, we mustn't forget that reaching people is only half the job done.

Saturday, October 10, 2009

Cloud'ed Thoughts

These were some of the questions posed to me during the cloud computing panel discussion at CSI Annual Convention 2009

Each one of you has a different view (PaaS, services, testing, startup, management) in the domain. A 5-minute warmer on your take on cloud computing based on your current work will be great. This will set the stage nicely for the discussion.
There are many “definitions” of cloud computing but for me “Cloud Computing is the fifth generation of computing after Mainframe, Personal Computer, Client-Server and the Web.” Its not often that we have a whole new platform and delivery model to create businesses on. And what's more its a new business model as well – using a 1000 servers for 1 hour costs the same as using 1 server for 1000 hours – no upfront costs, completely pay as you go!
How has cloud computing suddenly creeped on us and become technologically and economically viable? Because of 3 reasons:
  1. Use of commodity hardware and increased software complexity to manage redundancy on such hardware. The perfect example of such softwares is virtualisation, MapReduce, Google File System, Amazon's Dynamo, etc.
  2. Economies of scale. In a medium sized data center it costs $2.2 /GB/month while in a large data center it costs $0.40/GB/month. That is a cost saving of 5.7 times which cloud computing vendors have been possible to pass on to the customers. In general, cloud infrastructure players can avail 5 to 7 times decrease in cost.
  3. The third and according to me the most important reason: there was a need to scale for many organizations but not the ability to scale: As the world became data intensive, players realized that unless scalable computing, scalable storage and scalable software was available, their business models won't scale. Consider analytics as an example. Some years back it was possible for mid-sized companies to mine the data in their own data center but with data doubling every year they have been unable to keep up. They have decided to scale out to the cloud. Amazon, Google realized this from their own needs very early and look here we are eating their dog-food!
Developers with new ideas for innovative internet services no longer require large capital investments in hardware to deploy their service. They can potentially go from 1 customer to 100k customers in a matter of days. Over-provisioning or under-provisioning is no longer a factor if your product is hosted on cloud computing platforms. This enables small companies to focus on their core competency rather than worrying about infrastructure. This enables a much quicker go-to-market strategy.
Another advantage is that clouds are available in various forms:
  • Amazon EC2 is as good as a physical machine and you can control the entire software stack.
  • Google AppEngine and salesforce.com are platforms which are highly restrictive but good for quick development and allows the scaling complexity to be handled by the platform itself.
  • Microsoft Azure is at an intermediate point between the above two.
So depending on your needs, you can choose the right cloud!
As I said earlier its a new development environment and there is lot of scope for innovation which is what my company “Clogeny” is focusing on.
Cloud computing is not just about “compute” – it is also storage, content distribution and a new way of visualizing and using unlimited storage. How has storage progressed from multi-million dollar arrays and tapes to S3 and Azure and Google Apps?
I remember that when I started writing filesystems I needed to check for an error indicating that the filesystem was full. It just struck me that I have no need for such error checking when using cloud storage. So yes, its actually possible to have potentially infinite storage.
Storage: Storage arrays have grown in capacity and complexity over the years to satisfy the ever-increasing demand for size and speed. But cloud storage is pretty solid as well. Amazon, Microsoft and most other cloud vendors keep 3 copies of data and atleast 1 copy is kept at a separate geographical location. When you factor this into the costs, cloud storage is pretty cheap. Having said that, cloud storage is not going to replace local storage, fast and expensive arrays will still be needed for IOPS and latency hungry applications. But the market for such arrays may taper off.
Content Distribution: A content delivery network is a system of nodes in multiple locations which co-operate to satisfy requests for content efficiently. These nodes move the content around to serve it optimally where the node nearest to the user, serves the request. All the cloud providers offer content distribution services thereby improving reach and performance since requests can be served around the world from the nearest available server. This makes the distribution extremely scalable and cost efficient. The fun part is that the integration between cloud and CDN is seamless and can be done through simple APIs.
Visualizing storage: Storage models for the cloud have undergone a change as compared to the POSIX model and relational databases that we are used to. The POSIX model has given way to a more scalable flat key-value store in which a “bucket-name, object-name” tuple points to a piece of data. There is no concept of folder and files that we are used to. Note that for ease of use a folder-file hierarchy can be emulated. Amazon provides SimpleDB, a non-traditional database which is again easier to scale but your data organization and modeling will need to change when migrating to SimpleDB. MapReduce is a framework to operate on very large data sets in highly parallel environments. MapReduce can work on structured or unstructured data.
Consider this as an example, there is a online photo sharing company called SmugMug which estimates that it has saved $500,000 in storage expenditures and cut its disk storage array costs in half by using Amazon S3.

CC breaks the traditional models of scalability and infrastructure investment, especially for startups. A 1-person startup can easily compare with an IBM or Google on infrastructure availability if the revenue model is in place. What are the implications and an example of how?
Definitely, startups need to only focus on their revenue model and implementing their differentiators. The infrastructure, management and scaling are inherently available in a pay as you go manner so that ups and downs in traffic can be sustained. For examples, some sites get hit by very high traffic in first few weeks and need high infrastructure costs to service this traffic. But then the load tapers off and infrastructure lies unused. This is where the pay as you go model works very well. So yes, cloud computing is a leveller fostering many start-ups.
Also many businesses are using cloud computing for scale-out whereby their in-house data center is enough to handle certain amount of load but when load goes beyond a certain point they avail the cloud. Such hybrid computing is sometimes more economically viable.
Xignite employs Amazon EC2 and S3 to deliver financial market data to enterprise applications, portals, and websites for clients such as Forbes, Citi and Starbucks. This data needs to be delivered in real-time and needs rapid scale up and scale down.
What do you see when you gaze in the crystal bowl? 
Security is a concern for many customers but consider that the most paranoid customer – the US government has started a cloud computing initiative called “App.gov” where they are providing SaaS applications for federal use. Even if there are some issues, they are being surmounted as we speak. Cloud computing has now reached a critical mass and the ecosystem will continue to grow.
In terms of technology, I believe that there will be some application software running on-premise and another piece running on the cloud for scaling out. The client part can provide service in case of disconnected operations and importantly can help to resolve latency issues. Most cloud computing applications will have in-built billing systems that will either be a standard or software that both the vendor and customer trust. I would love to see some standards emerging in this space since that will help to accelerate acceptance.
Over the long term, absent of other barriers, economics always wins!” and the economics of cloud computing are too strong to be ignored.

A "Cloudy" day at CSI Annual Convention 2009


I had a very interesting opportunity to be one of the speakers on the panel discussion on cloud computing at CSI Annual Convention 2009. As it turned out the entire day was "cloudy" with most topics and discussions being centered around cloud computing. Most people agreed that cloud is the next generation of computing but there are still doubts as to which form of cloud computing will take off. The conclusion is that there IS a lot of hype and when that has died down, the products and companies who solve real problems will survive. People who try to monetize the medium instead of the product, might end up failing. Here are some of the excerpts from the day.

The day started with a keynote address on "Cloud Computing - Challenges and Opportunities" by Girish Venkatachaliah from IBM. His take was that about 20% of IT will move to the cloud in next few years and currently its more hype than substance.

Dr. Srikanth Sunderrajan from Persistent gave a great talk on Google AppEngine, a Platform-as-a-Service offering. His company recently implemented a product on top of Google AppEngine. His take was that AppEngine lacks many features and is a strait-jacket environment with almost no flexibility. They had to write complex libraries to enable file-system like storage and ended up using Amazon EC2 to aid the short-comings of AppEngine. His take was that Google needs to open up the platform and be more like Amazon's cloud offerings. One good thing about AppEngine is that development and deployment is fast and easy.

The panel discussion on cloud computing included Monish Darda from Websym, Karan Gujral from BMC , Gireendra Kasmalkar from SQS, Vikram Rajkondwar from Microsoft, Samir Bodas from ICERTIS and yours truly. The discussions covered PaaS, IaaS, SaaS, testing for the cloud, how can startups leverage the cloud, managing the clouds and much more. Vikram's views which stemmed from his experience working on Microsoft Azure were extremely insightful.

Here are some of the take-away points from the discussion:
  • The cloud phenomenon has been seeded due to the economies of scale. The cloud infrastructure providers use commodity hardware and use complex software to manage redundancy. The savings are passed on to the consumer making the cloud a very cost effective platform.
  • Evolution of virtualization technologies has enabled cloud data centers to increase efficiency. All parts of the stack will be virtualized as we progress.
  • Storage is an important aspect of the cloud. 3 copies of data are maintained by the cloud vendors so in terms of reliability to cost ratio, cloud storage is on par or cheaper than local storage. And unlimited storage is available on a completely pat as you go model.
  • Cloud is very interesting medium for testing and QE since these phases are needed late in the SDLC and require investment in terms of hardware and provisioning. Clouds make it possible to do functional and scale testing without upfront investment.
  • The most compelling use of cloud computing is when load and usage cannot be predicted. Cloud can be used to augment local data center - for scaling out when load exceeds certain levels. Such hybrid clouds will be the future of data centers. Another prime usecase is when loads are periodic - in-case of on-premise data centers this leads to low utilization and hence lesser ROI. Clouds can be provisioned as needed improving the ROI for such companies.
  • Today even a 1-person startup can compete with Google and IBM in terms of infrastructure. If a good revenue model is in place, then startups can use the pay as you go model to their advantage. Companies like SmugMug, ElephantDrive has done just this to keep up with their phenomenal growth. Without clouds, their growth would have stymied as they would not have had scale out capability.
  • The data center management companies will need to upgrade their products to manage the clouds. They will have to look at provisioning, job scheduling, profiling for the cloud along with the on-premise data center.
  • Everyone agreed that on-premise data centers will never be replaced by the cloud. They will be augmented. A lot of web hosting will move to the cloud though.
The conclusion was that companies and consumers should try to look through the hype and try to identify solutions that actually solve their problems. Every little software when provided as Software-as-a-Service does not become a better solution. If you find your sweet spot in the cloud, you are poised for phenomenal growth.

Thursday, September 24, 2009

Talk on Lustre at FOSS.in/2008

FOSS.IN is one of the world’s largest Free and Open Source Software (FOSS) events, held annually in India. The event is highly focussed on FOSS development and contribution. Over the years, it has attracted thousands of participants, and the speaker roster reads like a “Who’s Who” of FOSS contributors from across the world.

Last year I had the privilege of giving a talk on "Lustre: A Scalable Clustered Filesystem" at this event. This is one of the few events with a very techie agenda and I had some interesting discussions with the delegates. The breakout sessions where hackers sit together and actually code up a feature is really cool. Not many events would have people actually coding!

Here is my presentation. It describes the architecture of Lustre - a distributed, clustered filesystem which runs on 7 of the top 10 supercomputers. It goes on to describe some of the cutting-edge features that are being planned for future Lustre releases.



Tuesday, September 22, 2009

Inspirations from TiECon Delhi 2009

It isn't often that you get a chance to rub shoulders with industry leaders and successful entrepreneurs. TiECon gives you a chance to connect and interact with founders of successful companies, venture capitalists and budding entrepreneurs. It was great to be in presence of people with amazing clarity of thought and expression.

As I was contemplating what is the best way to structure this blog, I remembered attending some of the panel discussions where certain quotes and thoughts just resonated with my mind. It's like you have a gut feel of certain things but when they are put in perfect, concise words it becomes easy to put it into action. So the best way to express what I saw and learnt is in the form of quotes that I gathered personally or in discussions.

  • "Entrepreneurship is a difficult word to define, an entrepreneur has a difficult path to choose and a difficult path to tread. Many succeed and many fail. An entrepreneur who trips and falls down once; if he is a true entrepreneur will pick himself up and walk the same path or a different path with greater determination. An entrepreneur chases a dream, pursues an idea, and seeks a goal. So I think there is much to be said about entrepreneurship and any organization which promotes entrepreneurship rather than simple businesses." Mr. P. Chidambaram, Home Minister, Ministry of Home Affairs, Government of India
  • "Good entrepreneurs react differently to tough times" Deep Kalra, Founder & CEO, MakeMyTrip.com
  • "Thinking how soon I can 'breakeven' is a big fallacy. We need to think how will we scale up" Achal Ghai, Managing Director, Avigo Capital
  • "When the brand is in experimental stage, even spam works, especially in India" Manish Vij, Co-Founder and Business Head, Quasar Media
  • "80-90% of requests for venture capital get rejected due to lack of a marketing plan" Achal Ghai, Managing Director, Avigo Capital
  • "Only when you have done enough 'pilots' and have customers who can be brand ambassadors should you think of doing advertising" Manish Vij, Co-Founder and Business Head, Quasar Media
  • "For your core team you need like minded people and people who can work with equity" Dhruv Shringi, CEO & Co Founder, Yatra.com
  • "A part of your skill as an entrepreneur is to be a good salesman" Yashish Dahiya, Co Founder & CEO, Policy Bazaar.com
  • "Its better to be No.1 in a niche market than No.20 in a large market"
  • "You must be able to state your core value proposition in a single sentence"
  • "Be non-conforming"
  • "Leadership is about action not position"