Monday, January 25, 2010

Zero Copy - Mapping buffer into pages for disk IO

I needed to implement zero-copy for a block device driver. It turned out that a lot of IO in the driver was happening through buffers and earlier each IO involved page allocations and copying of data from page to buffer. This naturally ate up lot of CPU and needed improvements.

While implementing I did not find good example code in Linux kernel, due to which I ended up wasting some time in investigations. Some issues to consider:

1) Don't assume that all memory was kmalloc'ed. Check using is_vmalloc_addr() what type of memory it is.
2) On some architectures, even kmalloc allocations will cross page boundaries

void buffer_disk_io(struct my_req * req)
        int is_vmalloc;
        int count;
        unsigned int len;
        struct bio * bio;
        struct page * pg;
        unsigned int offset;
        unsigned int sector;
        void *addr;

        count = (req->num_sectors-1) / (PAGE_SIZE / SECTOR_SIZE) + 1;

        /* The buffer may not start from page boundary in some cases
         * but it can cross page boundaries */
        offset = offset_in_page(req->buffer);
        if (offset && (offset + (req->num_sectors << 9)) > PAGE_SIZE)

        bio = bio_alloc(GFP_NOIO, count);
        if (bio == NULL) {
                req->status.syserr = -ENOMEM;
                req->status.code = DS_ERR_UNKNOWN;

        bio->bi_bdev = req->path->bdev;
        bio->bi_sector = req->start_sector;

        bio->bi_private = req;
        bio->bi_end_io = __end_io_indirect;

        /* Check if memory is vmalloc'ed or kmalloc'ed */
        is_vmalloc = is_vmalloc_addr(req->buffer);

        sector = 0;
        while (sector < req->num_sectors) {
                addr = (req->buffer + (sector << 9));

                if (is_vmalloc) {
                        pg = vmalloc_to_page(addr);
                } else {
                        pg = virt_to_page(addr);

                offset = offset_in_page(addr);

                /* Consider case when offset in not on page boundary and it may
                 * or may not cross page boundaries */
                if ((req->num_sectors - sector) >= (PAGE_SIZE / DS_SECTOR_SIZE))
                        len = PAGE_SIZE - offset;
                else if (offset + ((req->num_sectors - sector) << 9) < PAGE_SIZE)
                        len = (req->num_sectors - sector) << 9;
                        len = PAGE_SIZE - offset;

                if (!bio_add_page(bio, pg, len, offset))
                        goto failed;

                sector += (len >> 9);

        /* set command */
        if (req->cmd == IO_READ) {
                bio->bi_rw = READ;
        } else if(req->cmd == IO_WRITE) {
                bio->bi_rw = WRITE;
        bio->bi_rw |= (1UL<



        if(bio != NULL)
                __end_io_indirect(bio, 0, -ENOMEM);


Monday, January 18, 2010

Performance engineering with OProfile

I have been working on a Linux kernel replication product and have been wanting to profile the kernel module for evaluating performance. I used OProfile as the tool for extracting the profiling information.

1. Install kernel-debuginfo packages or compile a kernel since we need vmlinux

2. Since I was working with VMWare, I needed to load oprofile module and use timer interrupts
               modprobe oprofile timer=1
This took a little bit of time to figure out. If this step is not done, then you would not see any logs even though you would not see any errors.

3. Provide oprofile with the path to your vmlinux file
               opcontrol --vmlinux=/usr/lib/debug/lib//vmlinux

4. Reset the data from oprofile is needed with
               opcontrol --reset

5. Run whatever operations you want to profile
               opcontrol --start; ; opcontrol --stop

6. Dump the logs that were generated
               opcontrol --dump

7. View the logs
              opreport -c --demangle=smart --image-path= --merge tgid | less

The logs can be viewed in various formats and many types of information can be extracted. You can see the command for that here.

Here is a snippet of the output which tells us that performance bottleneck is at lock_conflict() so we can look at the logic in and around this function to improve performance.

samples  %        samples  %        image name       app name      symbol name
40503    41.7815  19369    19.9559  vmlinux        vmlinux        prepare_to_copy
  40503    100.000  19369    100.000  vmlinux       vmlinux       prepare_to_copy [self]
19951    20.5808  6399      6.5929  foo.ko            foo             lock_conflict
  19951    100.000  6399     100.000  foo.ko         foo             lock_conflict [self]
7056      7.2787  25345    26.1130  pcnet32          pcnet32     (no symbols)
  7056     100.000  25345    100.000  pcnet32       pcnet32     (no symbols) [self]
6358      6.5587  1824      1.8793  foo.ko              foo             locks_overlapped
  6358     100.000  1824     100.000  foo.ko          foo             locks_overlapped [self]
6072      6.2637  12217    12.5872  foo.ko            foo             fsnlock_release_1
  6072     100.000  12217    100.000  foo.ko         foo             fsnlock_release_1 [self]
2844      2.9338  4098      4.2222  foo.ko              foo             init_lockmgr
  2844     100.000  4098     100.000  foo.ko          foo             init_lockmgr [self]
792       0.8170  831       0.8562  foo.ko                foo             block_write_umap
  792      100.000  831      100.000  foo.ko            foo             block_write_umap [self]
733       0.7561  626       0.6450  foo.ko                foo            fsnlock_release
  733      100.000  626      100.000  foo.ko            foo            fsnlock_release [self]
671       0.6922  129       0.1329  dt                      dt               fill_buffer
  671      100.000  129      100.000  dt                  dt               fill_buffer [self]
428       0.4415  173       0.1782  vmlinux              vmlinux      __make_request
  428      100.000  173      100.000  vmlinux          vmlinux      __make_request [self]
401       0.4137  412       0.4245  foo.ko               foo            ReadConfigurationFromCfg
  401      100.000  412      100.000  foo.ko           foo            ReadConfigurationFromCfg [self]

Saturday, January 16, 2010


I found the following an interesting read from Howard Mann :
There are tens of thousands of businesses making many millions a year in profits that still haven’t ever heard of twitter, blogs or facebook. Are they all wrong? Have they missed out or is the joke really on us? They do business through personal relationships, by delivering great customer service and it’s working for them. They’re more successful than most of those businesses who spend hours pontificating about how others lose out by missing social media and the latest wave. And yet they’re doing business. Great business. Not writing about it. Doing it.
I’m continually amazed by the number of people on Twitter and on blogs, and the growth of people (and brands) on facebook. But I’m also amazed by how so many of us are spending our time. The echo chamber we’re building is getting larger and louder.

More megaphones don’t equal a better dialogue. We’ve become slaves to our mobile devices and the glow of our screens. It used to be much more simple and, somewhere, simple turned into slow.
We walk the streets with our heads down staring into 3-inch screens while the world whisks by doing the same. And yet we’re convinced we are more connected to each other than ever before. Multi-tasking has become a badge of honor. I want to know why.
I don’t have all the answers to these questions but I find myself thinking about them more and more. In between tweets, blog posts and facebook updates.
 I tend to agree with him, especially the first paragraph. Twitter, Facebook and rest of the social media are certainly redefining virtual relationships and helping to create personal relationships. So while social media is a very important medium to reach more people, we mustn't forget that reaching people is only half the job done.