Here I’ll show how to set up my parallel computing environment of R.
Introduction
As far as I know, the virtualization for linux with OpenVZ does not loss too many computation efficiency. Moreover, it provides a simple way to copy whole computation environment from one machine to another. The consistency of the environment reduces the difficulty of setting MPI between machines, so I decide to build my R cluster under Proxmox, which is an easy to use open source virtualization platform.
Environment
OS:
- Host OS: Proxmox(Version 2.1-1/fb0f63a)
- Container OS: Ubuntu Precise(Version 12.04)
Software(Installed in Container OS):
Set up Proxmox Cluster
Install Proxmox
Please see Proxmox Installation
Set up Proxmox Cluster
Please see Create a Proxmox VE Cluster
Create a container
An easy way is to create the container under the management console of Proxmox.
Please download the container template from [http://wiki.openvz.org/Download/template/precreated] and put it under /var/lib/vz/template/cache/
.
Here, I write small shell script (modified from Here) to create the virtual container with 6 CPUs, 2G memory and 32G disk:
sh create_vz.sh #! /bin/bash VZUID="$1" VZHOSTNAME="$2" VZIP="$3" VZTEMPLATE="$4" if [[ $1 != "" && $2 != "" && $3 != "" && $4 != "" ]]; then /usr/bin/pvectl create $VZUID $VZTEMPLATE --cpus 6 --disk 32 --hostname $VZHOSTNAME --memory 2048 --swap 2048 --nameserver 8.8.8.8 --password initpasswd --pool Rslaves --netif ifname=eth0,mac=$(./macgen.py),host_ifname=veth103.0,host_mac=<host_mac>,bridge=vmbr0 /usr/bin/pvectl set $VZUID --ip_address $VZIP else /bin/echo "" /bin/echo "./create_vz.sh <UID> <HOSTNAME> <IP> <TEMPLATE>" /bin/echo "" /bin/echo "" /usr/bin/pvectl list fi
Note that the initial root password is initpasswd and the user need to fill ifconfig
in host machine or check the setting of the container created in management console).
The mac address is initialized according to the following python scripts:
``` py macgen.py #! /usr/bin/python # Filename: macgen.py # Usage: It’s intended to generate MAC addresses for virtualized # systems that created by Xen, OpenVZ, Vserver etc.
import random
The first line is defined for specified vendor
mac = [ 0x00, 0x24, 0x81, random.randint(0x00, 0x7f), random.randint(0x00, 0xff), random.randint(0x00, 0xff) ]
print ‘:’.join(map(lambda x: “%02x” % x, mac))
After creating _create_vz.sh_ and _macgen.py_ and put them into the same directory,
run the shell command `./create_vz.sh 200 Rmaster 192.168.0.100 /var/lib/vz/template/cache/ubuntu-12.04-x86_64.tar.gz`
Creating container private area (/var/lib/vz/template/cache/ubuntu-12.04-x86_64.tar.gz)
Performing postcreate actions
CT configuration saved to /etc/pve/openvz/200.conf
Container private area was created
CT configuration saved to /etc/pve/openvz/200.conf
# Set up the prototype of container
I will set up everything in one container and copy it to other machines in Proxmox Cluster.
Note that the following commands are executed in the container under root privilege.
## Initialize
Login the virtual machine via ssh(`ssh root@192.168.0.100`) with the initial root password.
``` sh
locale-gen --lang en_US en_US.UTF-8
apt-get update
apt-get upgrade -y
apt-get install build-essential -y
Install R
apt-get install r-base -y
Set /etc/hosts
Here I’ll set up a clusters with 3 machines:
text /etc/hosts 192.168.0.100 Rmaster 192.168.0.101 Rslave1 192.168.0.102 Rslave2
Set SSH
Enable ssh public key authentication.
adduser ruser
su ruser
ssh-keygen -t dsa -N "" -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
exit
Test
sudo -u ruser ssh ruser@localhost
We should directly login without prompt of password.
Install MPICH2
apt-get install mpich2 -y
Check Install Result
Run mpich2version
MPICH2 Version: 1.4.1
MPICH2 Release date: Wed Aug 24 14:40:04 CDT 2011
MPICH2 Device: ch3:nemesis
MPICH2 configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libexecdir=${prefix}/lib/mpich2 --srcdir=. --disable-maintainer-mode --disable-dependency-tracking --disable-silent-rules --enable-shared --prefix=/usr --enable-fc --disable-rpath --sysconfdir=/etc/mpich2 --includedir=/usr/include/mpich2 --docdir=/usr/share/doc/mpich2 --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr
MPICH2 CC: gcc -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -Wall -O2
MPICH2 CXX: c++ -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Wformat-security -Werror=format-security -Wall -O2
MPICH2 F77: gfortran -g -O2 -O2
MPICH2 FC: gfortran -O2
Install Rmpi with MPICH2
Comment out origin setting of CC
and SHLIB_LD
.
Add:
#CC = ...
CC = mpicc
...
#SHLIB_LD = ...
SHLIB_LD=mpicc
Open R console and execute following commands to install Rmpi with MPICH2:
r install Rmpi install.packages('Rmpi',configure.args="--with-Rmpi-type=MPICH2 --with-Rmpi-include=/usr/lib/mpich2/include --with-Rmpi-libpath=/usr/lib/mpich2/lib/ --with-mpi=/usr/include/mpich2/")
Recover /etc/R/Makeconf:
CC = ...
#CC = mpicc
...
SHLIB_LD = ...
#SHLIB_LD=mpicc
Modify /usr/local/lib/R/site-library/Rmpi/Rslaves.sh line 17:
text /usr/local/lib/R/site-library/Rmpi/Rslaves.sh ... $R_HOME/bin/R --no-init-file --slave --no-save < $1 > /tmp/$hn.$2.$$.log 2>&1 ...
Note that without modification of /usr/local/lib/R/site-library/Rmpi/Rslaves.sh will produce Permission denied during mpi.spawn.Rslaves
.
Install snow
Install snow
r insatll snow install.packages("snow")
Spread Prototype
In this section, the commands are executed in host OS (Proxmox) if there is no further explanation.
Shutdown the prototype of container
Before deploying the prototype of container to other machines, I suggest to shutdown the prototype.
sh Under container init 0
Backup the Snapshot of the container
run vzdump -dumpdir . 200
INFO: starting new backup job: vzdump 200 --dumpdir .
INFO: Starting Backup of VM 200 (openvz)
INFO: CTID 200 exist unmounted down
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: creating archive './vzdump-openvz-200-2012_08_16-13_53_23.tar'
INFO: Total bytes written: 960901120 (917MiB, 716MiB/s)
INFO: archive file size: 916MB
INFO: delete old backup './vzdump-openvz-200-2012_08_16-13_33_40.tar'
INFO: Finished Backup of VM 200 (00:00:02)
INFO: Backup job finished successfully
The Proxmox will dump the environment of the prototype container to one file whose size is about 1G. Note that 200 is the uid of the prototype.
Copy the prototype
I spread the prototype with the following shell scripts.
Please remember to modify the
sh spread_vz.sh #! /bin/bash VZUID="$1" VZHOSTNAME="$2" VZIP="$3" VZDUMP="$4" if [[ $1 != "" && $2 != "" && $3 != "" && $4 != "" ]]; then vzrestore $VZDUMP $VZUID pvectl set $VZUID --ip_address $VZIP --netif ifname=eth0,mac=$(./macgen.py),host_ifname=veth$VZUID.0,host_mac=<host_mac>,bridge=vmbr0 --hostname $VZHOSTNAME else /bin/echo "" /bin/echo "./spread_vz.sh <UID> <HOSTNAME> <IP> <VZDUMP>" /bin/echo "" /bin/echo "" /usr/bin/pvectl list fi
run ./spread_vz.sh 201 Rslave1 192.168.0.101 vzdump-openvz-200-2012_08_16-13_53_23.tar
extracting archive '/root/dump/vzdump-openvz-200-2012_08_16-13_53_23.tar'
Total bytes read: 960901120 (917MiB, 470MiB/s)
restore configuration to '/etc/pve/nodes/<cluster1>/openvz/201.conf'
CT configuration saved to /etc/pve/openvz/201.conf
run ./spread_vz.sh 202 Rslave2 192.168.0.102 vzdump-openvz-200-2012_08_16-13_53_23.tar
for second slave.
Deploy the slave
Just open the management console of Proxmox to migrate these new containers to other machines in Proxmox Cluster.
Validate Environment
- Start all containers.
- Login to Rmaster as ruser.
- Try ssh to Rslave1 and Rslave2 as ruser. It should not require password.
- Check the content of /etc/hosts on all machines.
- Create Rmpi.conf as
Rmaster
Rslave1
Rslave2
- Create Rmpi.test.R as
r Rmpi.test.R library(Rmpi) cl <- mpi.spawn.Rslaves(nslaves=6) mpi.close.Rslaves() mpi.quit()
- execute
mpiexec -np 1 -f Rmpi.conf R --vanilla < Rmpi.test.R
R version 2.14.1 (2011-12-22)
Copyright (C) 2011 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Rmpi)
> cl <- mpi.spawn.Rslaves(nslaves=6)
6 slaves are spawned successfully. 0 failed.
master (rank 0, comm 1) of size 7 is running on: Rmaster
slave1 (rank 1, comm 1) of size 7 is running on: Rslave1
slave2 (rank 2, comm 1) of size 7 is running on: Rslave2
slave3 (rank 3, comm 1) of size 7 is running on: Rmaster
slave4 (rank 4, comm 1) of size 7 is running on: Rslave1
slave5 (rank 5, comm 1) of size 7 is running on: Rslave2
slave6 (rank 6, comm 1) of size 7 is running on: Rmaster
> mpi.close.Rslaves()
[1] 1
> mpi.quit()
Install Rstudio Server
apt-get install libssl0.9.8 libapparmor1 apparmor-utils -y
wget http://download2.rstudio.org/rstudio-server-0.96.330-amd64.deb
dpkg -i rstudio-server-0.96.330-amd64.deb
See Download RStudio Server for details.
Set Rmpi in Rstudio Server
Configuration
Create /etc/Rmpi.conf
text /etc/Rmpi.conf Rmaster Rslave1 Rslave2
Create /usr/lib/rstudio-server/bin/rsession-mpiexec.sh:
``` sh /usr/lib/rstudio-server/bin/rsession-mpiexec.sh #! /bin/bash
mpiexec -np 1 -f /etc/Rmpi.conf -errfile-pattern /tmp/mpiexec.error.log /usr/lib/rstudio-server/bin/rsession “$@”
Modify the privilege:
``` sh
chmod u+x /usr/lib/rstudio-server/bin/rsession-mpiexec.sh
chmod g+x /usr/lib/rstudio-server/bin/rsession-mpiexec.sh
chmod o+x /usr/lib/rstudio-server/bin/rsession-mpiexec.sh
Create or modify /etc/rstudio/rserver.conf:
rsession-path=/usr/lib/rstudio-server/bin/rsession-mpiexec.sh
#rsession-path=/usr/lib/rstudio-server/bin/rsession
Modify /etc/apparmor.d/rstudio-server:
text rstudio-server #/usr/lib/rstudio-server/bin/rsession ux, /usr/lib/rstudio-server/bin/rsession-mpiexec.sh ux,
Restart Rstudio Server or restart the Rmaster
rstudio-server restart
Testing with Rmpi
Login into web interface of Rstudio and execute
library(Rmpi)
cl <- mpi.spawn.Rslaves(nslaves=6)
mpi.close.Rslaves()
mpi.quit()
You should see the slaves are spawned in different container/machines.
Testing with snow
Login into web interface of Rstudio and execute
library(snow)
cl <- makeMPIcluster(count = 18)
unlist(clusterEvalQ(cl, system('hostname',intern=TRUE)))
stopCluster(cl)
You should see the hostname of slaves.
Trouble Shooting:
- Check the network environment
- Is the /etc/hosts correct?
- Can ruser ssh to slaves and masters without password prompt?
- Check logs:
- the system log (/var/log/syslog)
- mpiexec error log (/tmp/mpiexec.error.log) which is set in /usr/lib/rstudio-server/bin/rsession-mpiexec.sh
Good Luck!