I found myself having to test some OpenMPI code recently, and needed a small cluster for the task. Performance wasn't really an issue for those tests since only the correctness of the code was important. Generally, in those situations where I need to test something quickly in an isolated environment I use VirtualBox or a Solaris zone. So I thought, why not here also. Moreover, it would be a good occasion for me to try project Crossbow for the first time. So the idea was to create a simple OpenMPI cluster using virtual machines for the nodes, interconnected with a Crossbow virtual network.
Roughly the steps to create a very basic cluster are as follow :
For information purposes here are the hardware specs and software version used :
Host machine :
For the nodes I used OpenSolaris 2009.06, not necessarily the first choice one would used for a production cluster but there was a few things that I wanted to try that required OpenSolaris. One requirement of using OSOL is that it needs a lot of memory to perform well. At a bare minimum 784MiB for a zfs installation. As such I decided to run the maximum number of machines my workstation could support which turned out to be 7. I know that 7 machines is more than required for a simple tests, but while being there let's test the limits of those technologies. Talking about limits, it was possible to start an additional VM but that would leaves very little memory for the host and the host system become un-responsive.
Here are the settings for the virtual machines (See the remarks at the end for more info) :
Installation and setup was pretty basic, and since I'm a bit lazy about it I did not bother setting them up to boot without graphical login. At that point I also needed additional libs from Blastwave on each machines, mainly the gcc4 runtime libs which I installed quickly. For those of you new to the Solaris world, Blastwave is, among other things, an open-source software repository for Solaris and OpenSolaris. Instructions on how to use it can be found here.
I guess this would be a good place for a screenshot. Note in the bottom the system memory usage.
While this is a lot more VM than what my workstation should handle, the whole thing is rather fluent when the VM are mostly idle. The most obvious effect comes from the lack of memory left for the zfs cache, combined with the over taxation of the host's hard-drives, consequently applications load times and storage accesses are much slower.
Finally, the machines where configured so that the master named mpi-0 could ssh to all the other machines. I used password-less ssh keys to achieve that result.
Next comes the virtual network. For this I kept it simple, every machines are connected to a virtual link via there own dedicated virtual interface. In this scenario the link will act as the network switch.
A Crossbow virtual interface, or vnic must be created over an existing link. That link can either be a physical one, generally corresponding to a network interface on the host machine, or it can be purely virtual. The later is called an etherstub and can be created with the dladm command like this :
dladm create-etherstub vlink0
Next, the seven virtual nics
dladm create-vnic -l vlink0 vnic0 dladm create-vnic -l vlink0 vnic1 ... dladm create-vnic -l vlink0 vnic6
Now that the virtual network is created the vnics can be managed like any other interface using ifconfig. Before doing anything though it's a good idea to plumb them.
ifconfig vnic0 plumb ifconfig vnic1 plumb ... ifconfig vnic6 plumb
The VirtualBox manual specifies that a virtual interface must have the same MAC address as the one assigned in the VM configuration. Since I run a script to create the network each time it is used I must specify the MAC address. It's also a good thing to set the MTU since they default to 9000.
ifconfig vnic0 ether 2:8:20:83:bd:3b mtu 1500 ...
In VirtualBox each machines must now be configured to use a unique bridged interface with the corresponding MAC address. For this to work the interfaces must already exist, furthermore the virtual machine will refuse to start if they are not available afterwards.
Installation of OpenMPI is rather simple and there is really nothing different than installing it on a real cluster. For convenience I installed my pre-built binaries in the ~/local/mpi directory of each machines since I will be testing a few different builds of OpenMPI.
Finally, for quickly testing that the jobs could be launched on all the hosts I compiled and ran a small program I found here a while ago. Running it give the following results, showing that all the nodes are reachable and have launched there job correctly. The hostfile only contains the list of hostname to be used for the cluster. In this case every machines where configured with static ip addresses and a corresponding entry in the /etc/hosts file.
mpi@mpi-0:~$ /export/home/mpi/local/mpi/bin/mpirun --mca btl tcp,self --hostfile hostfile /export/home/mpi/a.out Process 0 on mpi-0 out of 7 Process 1 on mpi-1 out of 7 Process 2 on mpi-2 out of 7 Process 4 on mpi-4 out of 7 Process 3 on mpi-3 out of 7 Process 5 on mpi-5 out of 7 Process 6 on mpi-6 out of 7
This is it. Now I can do whatever testing I need with OpenMPI. One could also do the same thing using Solaris zones, I might post en entry about it some other time if I get to do it.
Additional observations :