Wednesday, October 3, 2007

Making e2fsck parallel?

Making any fsck checker parallel is fraught with the likelihood that things can go wrong in some setups, with wierd hardwares and with different RAID configurations. XFS have added a prefetch mechanism to xfs_repair which does gives good results in most use-cases but "parallel fsck" is different from "readahead in fsck".

I have a few ideas (scattered, but jotting it down lest I forget them), about having proper parallelization in atleast pass1 (and probably pass2) in e2fsck. Pass1 of e2fsck reads inodes and dumps information into certain data structures, like inode_used_map, block_used_map, directory bitmap, dir block list, etc. Val Henson has recently posted a promising patch which adds readahead threads in pass1of e2fsck where processing of inodes will still be done serially. This does not lend any performance benefits
in a single disk case (though she noted excellent benefits in case of multi-disk RAID arrays).

So coming to the point, I think we can have multiple threads processing on the inodes and dumping information into the pass1 data structures. Contention can be greatly reduced (IMHO made negligible), by having each thread have a cache for each data structure. Each thread can then dump the information into the global shared data structure after its cache is full or after certain time period. This will also speed up e2fsck on single disk machines, as both cores(provided you have a dual-core CPU) can be working parallely.

Another important addition that can be made is to merge read/write requests coming in from different threads, this will especially help reading directory blocks which tend to be scattered around.

But, the fun of all this lies in the implementation, design seems easy enough logically, but implementation would be mired in intricacies. But this seems very interesting to implement.

Thursday, May 17, 2007

Lustre on UML, it works!!!!

I have always hated being dependent on my test machine and finding ways to dump it. VMWare is agonizingly slow and requires a LOT of memory which I cannot afford. Also what if I want to run 5 instances of a kernel??? So the next best option is UML (User Mode Linux), setting it up for basic filesystem programming was easy enough but setting it up for Lustre (www.lustre.org) was a good challenge. Lustre requires networking support, external modules so gcc versions should match, etc. so some extra stuff needs to be configured.

After many futile attempts at getting this done, I finally managed to run Lustre in UML. These are the approximate steps to do it. not for the weak of heart :)

But once it is setup you can work on it just like a test machine and setup a proper cluster with different MDS and OSTs and clients.

1) Patch your linux sources (2.6.18 lets say) with the lustre patches
and compile it for UML.

make defconfig ARCH=um
make menuconfig ARCH=um
select hostfs, loopback support, all ext3 options, universal tun support for networking.
make ARCH=um

2) Presuming you are doing this on fc6, get a fc6 root image from the internet or from me. Boot from the root image by
$ ./linux ubda=FC6_root_image eth0=tuntap,,,192.168.1.254 mem=256M

On your host system, you should set eth0 to the above string and do a tunctl -u root (this means you set the user for the tun, you need tun.ko present)

In UML, do ifconfig eth0 192.168.1.253 up and eth0 will be up in UML. Now add routing info by route add default gw 192.168.1.254 and your network is up.

3) Compile lustre sources with-linux set to your UML folder.

4) In UML, mount the host folder which has lustre like:
mount -t hostfs none -o /mnt/store/my_lustre_path /mnt/host_fs

5) Go to the lustre sources and insmod the modules by hand or run llmount.sh and you are done.
Make sure that host and rootfs use same gcc versions.

Note that there will be a few problems in between but nothing that
google can't solve. Then dump your test machine. :)

Here are some potential problems you may face:
1) UML does not compile on your system. Google and see if you can find some patch from blaisorblade. If not hack away, thats what I did. Deleted some headers and move some defines, whatever needs to be done. Make sure that you dont try to do it on fc6 since fc6 has some utrace patches which stop UML kernel from booting.
2) GCC versions of host and rootfs do not match and hence modules do not get inserted. I tried using a different GCC versions but caused problems.
3) Lustre networking - You may forget setting up the routing table, so you may want to automate it.