OpenMosix Dojo (version 1.0. Copyright of Mulyadi Santosa) Qemu and OpenMosix: The Internal Power of Virtualization Trying the first adventure into clustering arena? Then maybe you began to gather some old PC from your garage, borrowing PCs from your friend, or even sneaking into your neighbour's home trying to "pick"their PC? :-) All just because “oh boy, i just have one PC.........and I want to play with openMosix for a while, but I have no more PC...”. Or maybe, you are a brave spirit try to “conquer” openMosix, so you install openMosix on 4 PCs and then "booommmm" you got nasty segfaults. and then someone suggests you to download and try new version of openMosix patch....now it's time for another leg and hand sport, moving around between PCs to update the kernel. Well, LTSP might helps, but maybe it's not a good idea. So, it's time to gather your strength. If you know Chi practice on kungfu, now we do the same for your lonely PC....:-) If you ever heard tools like VMWare, Bochs, Xen, Plex86, User Mode Linux or the gangs, then it is time to meet Qemu(http://http://fabrice.bellard.free.fr/qemu/) Grab the source tarball at http://fabrice.bellard.free.fr/qemu/qemu-0.5.5.tar.gz, this is the latest version (0.5.5) Unpack the tarball (using tar -xzvf). Now, before you do the actual "make", apply the following patch ---------------------CUT Start of the Patch----------------------- --- ./before-diff/sdl.c 2004-05-18 10:33:05.000000000 +0700 +++ ./sdl.c 2004-05-18 10:40:55.000000000 +0700 @@ -130,6 +130,7 @@ static void sdl_process_key(SDL_KeyboardEvent *ev) { int keycode, v; + static int modif; /* XXX: not portable, but avoids complicated mappings */ keycode = ev->keysym.scancode; @@ -150,6 +151,78 @@ } else { keycode = 0; } + /* Adjust shift-key states when leaving window */ + + if (ev->keysym.scancode == 0) { + if ((modif ^ ev->keysym.mod) & KMOD_LSHIFT) + kbd_put_keycode(0x2a | (modif & KMOD_LSHIFT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RSHIFT) + kbd_put_keycode(0x36 | (modif & KMOD_RSHIFT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_LCTRL) + kbd_put_keycode(0x1d | (modif & KMOD_LCTRL ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RCTRL) { + kbd_put_keycode(0xe0 ); + kbd_put_keycode(0x1d | (modif & KMOD_RCTRL ? 0x80 : 0)); + } + if ((modif ^ ev->keysym.mod) & KMOD_LALT) + kbd_put_keycode(0x38 | (modif & KMOD_LALT ? 0x80 : 0)); + if ((modif ^ ev->keysym.mod) & KMOD_RALT) { + kbd_put_keycode(0xe0 ); + kbd_put_keycode(0x38 | (modif & KMOD_RALT ? 0x80 : 0)); + } + modif = ev->keysym.mod; + } + + /* remember shift-key state */ + + switch (keycode) { + case 0x2a: /* Left Shift */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_LSHIFT; + else + modif |= KMOD_LSHIFT; + break; + case 0x36: /* Right Shift */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_RSHIFT; + else + modif |= KMOD_RSHIFT; + break; + case 0x1d: /* Left CTRL */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_LCTRL; + else + modif |= KMOD_LCTRL; + break; + case 0x1de0: /* Right CTRL */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_RCTRL; + else + modif |= KMOD_RCTRL; + break; + case 0x38: /* Left ALT */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_LALT; + else + modif |= KMOD_LALT; + break; + case 0x38e0: /* Right ALT */ + if (ev->type == SDL_KEYUP) + modif &= ~KMOD_RALT; + else + modif |= KMOD_RALT; + break; + case 0x45: /* Num Lock */ + kbd_put_keycode(0x45); + kbd_put_keycode(0xc5); + return; + case 0x3a: /* Caps Lock */ + kbd_put_keycode(0x3a); + kbd_put_keycode(0xba); + return; + + } + /* now send the key code */ while (keycode != 0) { ---------------CUT End of Patch------------------------------ basically this is a patch for fixing a keyboard problem in the SDL Graphic output. This patch is adjusted for SDL-1.2.5-3 on Redhat 9, so feel free to adjust the patch for your distro/setting. Do I mention SDL? yes, you need to install SDL and SDL devel package if you want graphical output (it is heavily recommended....at least from my point of view) Now, do the usual mantra. I assume that you will install into /usr/local/qemu: # ./configure --prefix=/usr/local/qemu/ # make && make install Now, we are ready to build the disk image. You can imagine disk image as virtual hard drive for Qemu. I assume you want to create the disk image inside /mnt/qemu: dd of=/mnt/qemu/myimage bs=1M seek=700 count=0 The above command is example on creating 700 MB of empty image. You can set another size by changing "seek" and "bs" parameter. "man dd" for complete reference export this directory on QEMU_TMPDIR environment variable: export QEMU_TMPDIR=/mytmpfs after that, pick you Linux CD or ISO image and run the following command (from now on, please self adjust the actual path to qemu and qemu-fast binary): # qemu -hda /mnt/qemu/myimage -cdrom /mnt/cdrom -boot boot d -mem 64 This is relatively easy to understand, it tolds qemu to boot from CD Rom and also load the disk image so you can start the instalation. Couple weeks ago, I install debian 3.0 woody inside the disk image because i think it is relatively stable and compact. You can pick another distro of you flavour...just remember to give enough room because so far I don't know how to resize the disk image :-) Just install Linux as usual and don't forget to set swap partition. So, actually when you finish installing Linux, inside the disk image, it should contains the root partition and the swap. The things you need to include are gcc/glibc, shells (of course, who can live without it ;-) ), automake/autoconf, tar, gzip/gunzip. After finishing the Linux instalation, quit first from Qemu and now we move to openMosix kernel compilation. Put the below patch on your openMosix patched kernel to make it compatible with qemu-fast: ----------------CUT Start of patch------------------------ diff -Naur ./linux/arch/i386/vmlinux.lds ./linux-qemu/arch/i386/vmlinux.lds--- ./linux/arch/i386/vmlinux.lds 2002-02-26 02:37:53.000000000 +0700+++ ./linux-qemu/arch/i386/vmlinux.lds 2004-05-17 17:15:37.000000000 +0700@@ -6,7 +6,7 @@ ENTRY(_start) SECTIONS { - . = 0xC0000000 + 0x100000; + . = 0x90000000 + 0x100000; _text = .; /* Text and read-only data */ .text : { *(.text) diff -Naur ./linux/include/asm-i386/page.h ./linux-qemu/include/asm-i386/page.h--- ./linux/include/asm-i386/page.h 2004-05-14 12:26:48.000000000 +0700+++ ./linux-qemu/include/asm-i386/page.h 2004-05-17 17:14:50.000000000 +0700@@ -78,7 +78,7 @@ * and CONFIG_HIGHMEM64G options in the kernel configuration. */ -#define __PAGE_OFFSET (0xC0000000) +#define __PAGE_OFFSET (0x90000000) /* * This much address space is reserved for vmalloc() and iomap() --------------------------CUT end of patch--------------------------- This patch is modifying several kernel page offset, so it becomes compatible with qemu-fast..... Why do we need qemu-fast? Why not using plain Qemu? The answer is: (copied from Qemu documentation) "qemu-fast uses the host Memory Management Unit (MMU) to simulate the x86 MMU. It is fast but has limitations because the whole 4 GB address space cannot be used and some memory mapped peripherials cannot be emulated accurately yet" In other word, qemu-fast doesn't simulate MMU, instead it use the host's MMU.....should be faster right? But yes, there is 4GB limitation, but who want 4GB just for simulation? :-) It should be fine for general case AFAIK On kernel configuration, remember to add kernel native (not module) for ne2k and ne2000: ( you can found them on "Network device support"-->"Etherne 10 or 100 MBit") CONFIG_NE2000=y CONFIG_NE2K_PCI=y I am not sure which one actually needed for Qemu, but adding both won't hurt :-) Feed another option if you think you will need them Do the usual kernel compilation, and move the finished bzImage (i prefer bzImage, it is up to you the pick the final type of kernel), vmlinux and System.map to a directory. if you had modules, we will move them later inside the disk image. Lets assume you move them to /boot/qemu Oh, BTW, it is also a good idea to put tmpfs mounted directory for Qemu's need. here I create 1 Gigabyte tmpfs: mount -t tmpfs -o size=1G tmpfs /mytmpfs/ Now, we need to testdrive the kernel. Put following command as shell script : /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/myimage -hdb /dev/hda -kernel /boot/qemu/bzImage -append "root=/dev/hda1 ide3=noprobe ide4=noprobe ide5=noprobe" The above script assume that you create root filesystem inside the disk image on first partition, that's why the root parameter is "/dev/hda1". And what is "-hdb /dev/hda"? Well, we need to copy several files from host system, so we need to mount the disk inside the Qemu :-) If your layout is different, again feel free to modify the parameter You get the login prompt? Congratulations ! Now, login and make sure you have following report from "dmesg": NE*000 ethercard probe at 0x300: 52 54 00 12 34 56 eth0: NE2000 found at 0x300, using IRQ 9. This lines indicate that kernel succesfully detect emulated NE2000 card. So far I have no problem with the "fake" NE2000, so the only trick is....just make sure you are including NE2000 support. We have moved half-way so far. Now we move the kernel modules....how? by mounting fake "/dev/hdb" inside Qemu. e.g: mount -t ext2 /dev/hdb1 /mnt/host There....you can access the host filesystem, now copy the /lib/modules straight into the disk image. After that, "halt" the guest system and restart qemu using above script. You should find out that now "the missing modules" are loaded successfuly openMosix need user land tools, right? Same like above, transfer the userland tools tarball (use version 0.3.5) inside the guest and do compilation. This way, we make sure that it is compiled against correct gcc/glibc.Oh wait? you need oM kernel headers? Mount the host filesystem and create the soft link from openMosix kernel source toward /usr/src/linux-openmosix and then the compilation will goes smooth I will skip the oM spesific setting, just refer to the HOWTO for how to setup /etc/inittab, setting maps etc. Also remember to setup ip address for eth0 (on Debian, you can turn it on after start up using /etc/networks/interface) Again, shut down the guest system. Now we move to setting up the second node. "What, doing above steps again? You gotta be kidding, right? I need faster way" ! Ok, relax :-) that's why we will create COW (Copy on Write Image). What is it? You can imagine as a way for sharing original disk image between Qemu instances, but each instance keeping its own copy of disk image if they do some modifications inside the original disk image. The original image will be safe..... Lets create two COWs (yes COW, but not cows which produces milk, ok? :-))) ): # qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow1.cow # qemu-mkcow -f /mnt/qemu/myimage /mnt/qemu/mycow2.cow Not so difficult, right? After that, create script for enabling the TUN/TAP device. "Wait wait....TUN/TAP, why do I need it?" Well, TUN/TAP is virtual device that acting as network bridge between guest system and its host. So, if you don't turn it on, there is no "network connection" between guest and its host here is the example of the script: #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0 Modify above script for each TUN/TAP of the guests and remember not to assign same IP to other TUN/TAP or guest's IP I suggest to separate the netmask of TUN/TAP device and the interface inside the guest against the netmask of host. I use this trick so I won't mess a lot with host's routing table. For this "Dojo" i use following topology: host (10.1.1.1) / \ / \ / \ 1st TUN/TAP (192.168.1.11) 2nd TUN/TAP (192.168.1.12) | | | | 1st guest (192.168.1.21) 2nd guest (192.168.1.22) Got brighter picture from above diagram? I hope so.....:-) So, back to the TUN script, you should write two script: For 1st TUN/TAP: (name it /mnt/qemu/qemu-ifup) #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.11 netmask 255.255.255.0 For 2nd TUN/TAP: (name it /mnt/qemu/qemu-ifup2) #!/bin/sh sudo /sbin/ifconfig $1 192.168.1.12 netmask 255.255.255.0 eth0 inside 1st guest: 192.168.21 eth0 inside 2nd guest: 192.168.22 I use above IP numbering so I can quickly remind myself about the topology (x.x.x.x1 is for 1st group, x.x.x.x2 for 2nd group). You don't have to follow my idea :-) because you already have two COWs, modify the qemu start script, so it becomes: (for 1st guest) /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow1.cow -macaddr 52:54:00:12:34:56 -kernel /boot/qemu/bzImage -n ./qemu-ifup -append "root=/dev/hda1 ide3=noprobe ide4=noprobe ide5=noprobe" (for 2nd guest) /usr/local/qemu/bin/qemu-fast -hda /mnt/qemu/mycow2.cow -macaddr 52:54:00:12:34:60 -kernel /boot/qemu/bzImage -n ./qemu-ifup2 -append "root=/dev/hda1 ide3=noprobe ide4=noprobe ide5=noprobe" "I am noticing -macaddr switch...Why do we need it?" The answer: if you don't set it explicitly, you will get same mac address for the both guest system...and that would confuse TCP/IP arp resolve mechanism. So, we need to set distinct MAC adress for each of guests. Still confuse on what I am talking about? Go to RFC about ARP or TCP/IP and read about IP to MAC Adress resolution mechanism. Now, fire up both qemu instance and watch them load the openMosix kernel until you got login prompt. back to host system, now we need to setup a bridge connecting these 2 guests "Oh boy, another pain is come....when it will stop? " :))) remember, dojo is the place to practice, not for instant skill like Neo when he got kung fu skill inside Matrix :-) The quote "No pain no gain" must be followed here :_))) OK, back to bridge. You can imagine bridge as "a hub connecting any target network interface" Copy following script to setup bridge between TUN0 and TUN1: (let's name is start-bridge.sh) #!/bin/bash /sbin/modprobe bridge /sbin/route del -net 192.168.1.0 netmask 255.255.255.0 /sbin/route del -net 192.168.1.0 netmask 255.255.255.0 /usr/sbin/brctl addbr br0 /usr/sbin/brctl addif br0 tun0 /usr/sbin/brctl addif br0 tun1 /sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0 /sbin/ifconfig tun0 0.0.0.0 /sbin/ifconfig tun1 0.0.0.0 Basically, the default kernel on redhat 9 (the one I use as experiment) comes with bridging capability as module (bridge.o) If you don't found one, recompile the kernel and make sure you include this: CONFIG_BRIDGE=m (as module)--> preferred or CONFIG_BRIDGE=y (as native kernel part) You can find them under "Networking options". It is named "802.1d Ethernet Bridging". Why do we need to do "route del"? Well, remember that we previously turn up the TUN/TAP device? On Linux (recently), "ifconfig" automatically setup routing for each new IP address assigned to a device. So, basically we clean them up becaue we don't need them! The next line is about setting up the bridge itself. You need to install bridge-utils RPM (RH 9 includes this tools). If you don't think your distro doesn't include it, go to http://www.math.leidenuniv.nl/~buytenh/bridge and grab the tarball there. Actually, what I am goinf to explain is short version of Bridge Mini Howto, you can find more about bridging on www.tldp.org and search about "bridge". usually many distribution includes this docs. Lets analyze the command /usr/sbin/brctl addbr br0 --> here we create new bridge interface named "br0" /usr/sbin/brctl addif br0 tun0 /usr/sbin/brctl addif br0 tun1 --> here we "bond" the tun0 and tun1 so they were attached "inside" the bridge /sbin/ifconfig br0 192.168.1.13 netmask 255.255.255.0 --> like you know, assign an IP address and netmask for the bridge. You still need to assure that the bridge on same subnet like the guests are..... /sbin/ifconfig tun0 0.0.0.0 /sbin/ifconfig tun1 0.0.0.0 --> easy, just assign 0.0.0.0 IP (but not turn down, i repeat DO NOT turn the TUNs down) for the TUNs :-) The topology becomes host (10.1.1.1/24) / \ / \ / \ T H E B R I D G E (192.168.1.13/24) | | | | 1st TUN/TAP (0.0.0.0) 2nd TUN/TAP (0.0.0.0) | | | | 1st guest (192.168.1.21/24) 2nd guest (192.168.1.22/24) Now try to ping from guest 1 to guest 2 and likewise.....success? Now start the openMosix (just copy the openMosix start/stop script from openMosix userland tarball) and confirm that "mosmon" see all the nodes ! Let me state something before we goes further. Something inside Qemu screw up openMosix auto detection of system's speed, so make sure you include this line on your openMosix startup script mosctl setspeed 15000 feel free to adjust the number, but make sure you set same number across all guest system, if not, you will got weird load levelling mechanism...believe me....:-) It takes 2 days for me just to find out about this "speed" thing when I saw openMosix doesn't load balance my program :-))) After that, try the migration between guests.....this won't be an openMosix cluster if it can't migrate process, right? :-) Just compile simple C program like below:void main() { int a=0,b=0; for (a=0;a<=1000000;a++) for (b=0;b<=1000000;b++) { }; } suppose you name it "silly.c" then compile it as "silly" and run silly in the background (add "&"). 2 instance of "silly" is sufficient for start.. Success? Then congratulations...you have exercised your Chi into highest level :-) have fun with your new virtual Cluster reference: - Qemu user documentation and technical documentation - openMosix HOWTO - Ethernet Bridge mini HOWTO - Documentation/networking/tmpfs.txt inside the kernel source directory