How to install the Grid Engine software on hosts with the Solaris TM Operating Environment IP Multipathing (IPMP) technology
 

This document describes how to install Grid Engine on machines with multiple network interafces (multi-homed host).   Particular attention is given to the Solaris Operating Environment IP Multipathing technology,. The procedure presented here should work for other environments as well.
 

What is IP Multipathing?

 
IP Multipathing is a technology that allows grouping of TCP/IP interfaces for fail over and load balancing purposes. If an interface within an IP Multipathing group fails, the interface is disabled and its IP address is relocated to another interface in the group. Outbound IP traffic is distributed across the interfaces of a group.
For further details on IP Multipathing, refer to the Solaris Operating Environment documentation, which can be found at:
http://docs.sun.com/app/docs/doc/806-6547 .
The IPMP features overview can be found at:
http://docs.sun.com/app/docs/doc/806-6547/6jffv7oma?a=view .
 
 
Issues between IPMP and Grid Engine
The only major issue is the error messages while starting the Grid Engine daemons on a machine in which the main interface is part of an IPMP group. This occurs when the IPMP load balancing distributes the connections across the interfaces in the group; therefore, the IP packets show up at the receiving end as coming from a different host rather than the one associated with the main interface.

For example, let's say we have a machine with three interfaces named qfe0, qfe1, and qfe3 , where the IP addresses for these interfaces are 10.1.1.1, 10.1.1.2 and 10.1.13 respectively. IPMP would need an extra address for each interface for testing, but we will ignore those in this example. Each of these addresses has a hostname associated with it.  The hosts table looks like:

    10.1.1.1 sge

    10.1.1.2 sge-qfe1
    10.1.1.3 sge-qfe2
 
The machine's hostname is sge. When a connection is established from sge to another machine, it might go through sge, sge-qfe1 , or sge-qfe2. Upon installation, Grid Engine  will only recognize sge.When it receives a connection from sge-qfe2 , it closes the connection because it is not from one of the authorized (or known) nodes.

To solve this issue we have to use the host_aliases files (see sge_h_aliases man page for details). This file can be used to "tell"  Grid Engine that sge, sge-qfe1, and sge-qfe2 are all from the same machine. The host_aliases file for this case would look like this:

 
    sge sge-qfe1 sge-qfe2
Note: If you make any changes to the $SGE_ROOT/$SGE_CELL/common/host_aliases file, all running  Grid Engine daemons (sge_qmaster, sge_scheduler, sge_execd and sge_commd) must be stopped and restarted. Login as root to all your  Grid Engine hosts and enter:
 
    /etc/init.d/rcsge stop
    /etc/init.d/rcsge start
 
 
How to install the Grid Engine master node with IPMP
 
There are at least two options:
 
A) Ignore the error messages during installation. The procedure is:
        1. Run inst_sge -m, ignoring the error messages during the start up of the daemons.
        2. Shutdown the daemons with /etc/init.d/rcsge stop. Due to the networking errors, some daemons fail to shutdown
            and must be killed with kill -9. To check which daemons failed to shutdown use: ps -e | grep sge_.
        3. Install the host_aliases file in the $SGE_ROOT/$SGE_CELL/common directory.
        4. Restart the daemons with /etc/init.d/rcsge start.
             Note: This procedure is Operating System independent.
 
B) Temporarily disable IPMP on the interface associated with the machine's hostname. The procedure is:
        1. Identify the interface associated with the machine's hostname.
        2. Verify the interface has IPMP enabled with:
            ifconfig <<interface>> | grep groupname.
        3. Take note of the group name.
        4. Disable IPMP with: ifconfig <<interface>> group "" .
        5. Install the Grid Engine master node.
        6. Install the host_aliases file in the  $SGE_ROOT/$SGE_CELL/common directory.
        7. Restart all the Grid Engine daemons.
        8. Re-enable IPMP: ifconfig <<interface>> group <<IPMP group>>.
            Note: This procedure is valid only for SolarisTM 8 Operating Environment or newer.
 
 
How to install a Grid Engine execution host with IPMP
Once the host_aliases file is installed and the Grid Engine daemons are restarted, you can simply start the execution host installation without further problems.

 

How to enable administrative and submit hosts with IPMP

You can either follow the same procedure used for the execution host (e.g. update host_aliases before installation, see the note on changes to the host_aliases file ), or add all the hostnames associated with the administrative, or submit host with:

    qconf -ah <<hostname>> <<alias 1>> <<alias 2>> ...
           (for the administrative host) or

    qconf -as <<hostname>> <<alias 1>> <<alias 2>> ...
           (for the submit host).
 

Trademarks

Sun and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Sun et Solaris sont des marques déposées ou enregistrées de Sun Microsystems, Inc. aux Etats-Unis et dans d'autres pays.