Thursday, May 17, 2007

Lustre on UML, it works!!!!

I have always hated being dependent on my test machine and finding ways to dump it. VMWare is agonizingly slow and requires a LOT of memory which I cannot afford. Also what if I want to run 5 instances of a kernel??? So the next best option is UML (User Mode Linux), setting it up for basic filesystem programming was easy enough but setting it up for Lustre (www.lustre.org) was a good challenge. Lustre requires networking support, external modules so gcc versions should match, etc. so some extra stuff needs to be configured.

After many futile attempts at getting this done, I finally managed to run Lustre in UML. These are the approximate steps to do it. not for the weak of heart :)

But once it is setup you can work on it just like a test machine and setup a proper cluster with different MDS and OSTs and clients.

1) Patch your linux sources (2.6.18 lets say) with the lustre patches
and compile it for UML.

make defconfig ARCH=um
make menuconfig ARCH=um
select hostfs, loopback support, all ext3 options, universal tun support for networking.
make ARCH=um

2) Presuming you are doing this on fc6, get a fc6 root image from the internet or from me. Boot from the root image by
$ ./linux ubda=FC6_root_image eth0=tuntap,,,192.168.1.254 mem=256M

On your host system, you should set eth0 to the above string and do a tunctl -u root (this means you set the user for the tun, you need tun.ko present)

In UML, do ifconfig eth0 192.168.1.253 up and eth0 will be up in UML. Now add routing info by route add default gw 192.168.1.254 and your network is up.

3) Compile lustre sources with-linux set to your UML folder.

4) In UML, mount the host folder which has lustre like:
mount -t hostfs none -o /mnt/store/my_lustre_path /mnt/host_fs

5) Go to the lustre sources and insmod the modules by hand or run llmount.sh and you are done.
Make sure that host and rootfs use same gcc versions.

Note that there will be a few problems in between but nothing that
google can't solve. Then dump your test machine. :)

Here are some potential problems you may face:
1) UML does not compile on your system. Google and see if you can find some patch from blaisorblade. If not hack away, thats what I did. Deleted some headers and move some defines, whatever needs to be done. Make sure that you dont try to do it on fc6 since fc6 has some utrace patches which stop UML kernel from booting.
2) GCC versions of host and rootfs do not match and hence modules do not get inserted. I tried using a different GCC versions but caused problems.
3) Lustre networking - You may forget setting up the routing table, so you may want to automate it.