Virtual cluster with VirtualBox, OpenMPI and project Crossbow

I found myself having to test some OpenMPI code recently, and needed a small cluster for the task. Performance wasn't really an issue for those tests since only the correctness of the code was important. Generally, in those situations where I need to test something quickly in an isolated environment I use VirtualBox or a Solaris zone. So I thought, why not here also. Moreover, it would be a good occasion for me to try project Crossbow for the first time. So the idea was to create a simple OpenMPI cluster using virtual machines for the nodes, interconnected with a Crossbow virtual network.

Roughly the steps to create a very basic cluster are as follow :

  • Create the virtual machines
  • Setup remote login for all the nodes
  • Setup the virtual network using Crossbow
  • Install and test OpenMPI

For information purposes here are the hardware specs and software version used :

Host machine :

  • 2.2 Ghz quad-core AMD processor with AMD-V and Nested paging support
  • 8 GiB of DDR2-800 RAM
  • OS : OpenSolaris development version snv_117

Softwares :

  • VirtualBox 3.0.2
  • OpenMPI 1.3.3
  • OpenSolaris 2009.06 (For the VMs)
  • Sun Studio 12 Update 1

The virtual machines

For the nodes I used OpenSolaris 2009.06, not necessarily the first choice one would used for a production cluster but there was a few things that I wanted to try that required OpenSolaris. One requirement of using OSOL is that it needs a lot of memory to perform well. At a bare minimum 784MiB for a zfs installation. As such I decided to run the maximum number of machines my workstation could support which turned out to be 7. I know that 7 machines is more than required for a simple tests, but while being there let's test the limits of those technologies. Talking about limits, it was possible to start an additional VM but that would leaves very little memory for the host and the host system become un-responsive.

Here are the settings for the virtual machines (See the remarks at the end for more info) :

  • OpenSolaris 2009.06
  • 784Mib of memory
  • 1 cpu
  • AMD-V and nested paging activated
  • 16 GiB dynamically expanding HDD, on an SATA controller

Installation and setup was pretty basic, and since I'm a bit lazy about it I did not bother setting them up to boot without graphical login. At that point I also needed additional libs from Blastwave on each machines, mainly the gcc4 runtime libs which I installed quickly. For those of you new to the Solaris world, Blastwave is, among other things, an open-source software repository for Solaris and OpenSolaris. Instructions on how to use it can be found here.

I guess this would be a good place for a screenshot. Note in the bottom the system memory usage.


VirtualCluster-7vm.png
Seven virtual machines with desktop. (Click for full resolution.)

While this is a lot more VM than what my workstation should handle, the whole thing is rather fluent when the VM are mostly idle. The most obvious effect comes from the lack of memory left for the zfs cache, combined with the over taxation of the host's hard-drives, consequently applications load times and storage accesses are much slower.

Finally, the machines where configured so that the master named mpi-0 could ssh to all the other machines. I used password-less ssh keys to achieve that result.

Crossbow virtual networking

Next comes the virtual network. For this I kept it simple, every machines are connected to a virtual link via there own dedicated virtual interface. In this scenario the link will act as the network switch.

A Crossbow virtual interface, or vnic must be created over an existing link. That link can either be a physical one, generally corresponding to a network interface on the host machine, or it can be purely virtual. The later is called an etherstub and can be created with the dladm command like this :

dladm create-etherstub vlink0

Next, the seven virtual nics

dladm create-vnic -l vlink0 vnic0
dladm create-vnic -l vlink0 vnic1
...
dladm create-vnic -l vlink0 vnic6

Now that the virtual network is created the vnics can be managed like any other interface using ifconfig. Before doing anything though it's a good idea to plumb them.

ifconfig vnic0 plumb
ifconfig vnic1 plumb
...
ifconfig vnic6 plumb

The VirtualBox manual specifies that a virtual interface must have the same MAC address as the one assigned in the VM configuration. Since I run a script to create the network each time it is used I must specify the MAC address. It's also a good thing to set the MTU since they default to 9000.

ifconfig vnic0 ether 2:8:20:83:bd:3b mtu 1500
...

In VirtualBox each machines must now be configured to use a unique bridged interface with the corresponding MAC address. For this to work the interfaces must already exist, furthermore the virtual machine will refuse to start if they are not available afterwards.


VirtualCluster-nicconfig.png
NIC configuration in VirtualBox. (Click for full resolution.)

Installation and testing of OpenMPI

Installation of OpenMPI is rather simple and there is really nothing different than installing it on a real cluster. For convenience I installed my pre-built binaries in the ~/local/mpi directory of each machines since I will be testing a few different builds of OpenMPI.

Finally, for quickly testing that the jobs could be launched on all the hosts I compiled and ran a small program I found here a while ago. Running it give the following results, showing that all the nodes are reachable and have launched there job correctly. The hostfile only contains the list of hostname to be used for the cluster. In this case every machines where configured with static ip addresses and a corresponding entry in the /etc/hosts file.

mpi@mpi-0:~$ /export/home/mpi/local/mpi/bin/mpirun --mca btl tcp,self --hostfile hostfile /export/home/mpi/a.out
Process 0 on mpi-0 out of 7
Process 1 on mpi-1 out of 7
Process 2 on mpi-2 out of 7
Process 4 on mpi-4 out of 7
Process 3 on mpi-3 out of 7
Process 5 on mpi-5 out of 7
Process 6 on mpi-6 out of 7

This is it. Now I can do whatever testing I need with OpenMPI. One could also do the same thing using Solaris zones, I might post en entry about it some other time if I get to do it.

Additional observations :

  • While the virtual network will not persist after a reboot, dladm still report an error saying that the interface exist upon creation. Even though they did not exist. The workaround is to destroy all the vnic before recreating the network.
  • Nested paging is essential for the whole thing to work at a usable speed. It is expected, but the performance drop when deactivating it is astonishing.
  • Likewise, activating the SMP capability of VirtualBox on the VMs greatly reduce the overall performance and increase idle cpu usage considerably. As stated in the manual, in SMP mode one should not assign more cpu than what is available.
  • When running the stability is rock solid. The only problem I encountered was when starting or stopping all the VMs at the same time. Hard drive access latency gets so high that it appears to prevent some machines to boot correctly. Note that this is most probably not a bug.

Comments

tres cool!

This is super cool. This has me going for some other fun playing. Seeing if you can do clustered filesystems and some other neat stuff as well!

-luke

Awesome!

Thanks for this info. This answered a bunch of questions I had.

Post new comment

The content of this field is kept private and will not be shown publicly.