This document describes how to install Grid Engine on machines with multiple network interfaces (multi-homed host). It was originally Solaris-specific and GNU/Linux parts were added later. A special case, using the SolarisTM Operating Environment IP Multipathing (IPMP) technology for IP failover, is described in a separate HOWTO.
Suppose we have two Ethernet interfaces (say hme0 and hme1) on each machine. One interface is associated with general traffic, NFS file sharing and so on
(hme0) and the other is dedicated to Grid Engine communications (hme1). We would like to set up Grid Engine so that it communicates only on the Grid Engine dedicated network. In this example, there is a Grid Engine master node (sun-master) and three exec hosts (sun-1, sun-2, sun-3)
On Solaris in /etc, there will be file called hostname.hme0 populated with the hostname (e.g. sun-1) and another file called hostname.hme1 populated with the grid engine interface name (e.g. sun-1-grid). On GNU/Linux, interface eth0 would typically be the interface corresponding to the canonical name returned by hostname(1), possibly configured via /etc/interfaces or /etc/sysconfig/network-scripts/. In the /etc/hosts file, there should of course, be entries for the SGE interface as well as the standard interface
# # Grid Engine Network # 192.168.7.2 sun-master-grid 192.168.7.3 sun-1-grid 192.168.7.11 sun-2-grid 192.168.7.12 sun-3-grid
When both networks are functioning correctly, install gridengine.
Install SGE on all hosts.
Under $SGE_ROOT/$SGE_CELL/common, create a file named host_aliases (see host_aliases(5)) and populate as follows:
# cat host_aliases sun-master-grid sun-master sun-1-grid sun-1 sun-2-grid sun-2 sun-3-grid sun-3
Check that SGE can resolve all the hostnames correctly:
# cd /gridware/sge/utilbin/solaris64 # ./gethostbyname -aname sun-1 sun-1-grid # ./gethostbyname -aname sun-1-grid sun-1-grid #
Shut down the exec hosts
# qconf -ke `qconf -sel` sent shutdown notification to execd host ... #
And then also shut down the master, using the same command;
Alter the file $SGE_ROOT/$SGE_CELL/common/act_qmaster to read the name of the masters' grid engine interface rather than the hostname (e.g. sun-master-grid)
Start up the master node now:
# /etc/init.d/sgemaster starting sge_qmaster #
Now start up the SGE exec hosts
# pdsh ... /etc/init.d/sgeexecd start ... #
Snoop the network to check that the correct interfaces are being used:
# qsub -q sun-1 test.sh # snoop -V -d hme1 sun-1-grid sun-master-grid -> sun-1-grid TCP D=46883 S=536 Syn Ack=2694354350 \ Seq=2537161622 Len=0 Win=49640 Options=<mss 1460,nop,nop,sackOK> sun-1-grid -> sun-master-grid TCP D=536 S=46883 Ack=2537161623 \ Seq=2694354350 Len=0 Win=49640
snoop is a Solaris utility; use tcpdump(1) more generally, or some other packet capture tool for your operating system.
TrademarksSun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Sun et Solaris sont des marques déposées ou enregistrées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays.