Using SolarisTM 9 Resource Manager software with Grid Engine software.



What is Solaris 9 Resource Manager software?

Solaris 9 Resource Manager software is a new set of features designed to enhance resource control and accounting on the SolarisTM Operating Environment (Solaris OE). Detailed information about it can be found in the SolarisTM 9 Operating Environment documentation set.

Here are some related links:

FAQ: http://wwws.sun.com/software/solaris/faqs/resource_manager.html.

Overview: http://wwws.sun.com/software/solaris/ds/ds-srm/,

Solaris 9 Resource Manager manual: http://docs.sun.com/db/doc/806-4076/6jd6amqor?q=Solaris+9+resource+Manager&a=view.

Sun Blue Prints: http://www.sun.com/solutions/blueprints/0902/816-7753-10.pdf

Two papers on this subject were presented at Sun's SuperG 2002 conference.


What are the benefits?

It can be used to generate detailed accounting information, to better enforce limits in a job and to prevent a job from using more CPUs than requested.


How to use it?

Before you start, set up the projects database and extended accounting facility.  Please refer to the Solaris OE documentation for detailed setup instructions.

The key is to associate each job to a taskid of Solaris 9 Resource Manager software. To do this, run newtask at job start up, either on the starter method or on the prolog.

The starter method is simpler; since it execs the end user script. An example of a starter method is:

exec /usr/bin/newtask $SGE_STARTER_SHELL_PATH  $*

A sample starter method is provided in Appendix E.

The advantage of using the queue/host prolog is that it can be run as root; and thus perform privileged operations without the need of setuid helper programs. On the other hand, the prolog process does not exec the user script. To make the changes "stick" you have to change the taskid of the shepherd process instead of the prolog's process:

newtask -c `ps -o ppid= -p $$` (CORRECT: changes the shepherd's taskid).

Once the shepherd has the new taskid the starter method, user script and epilog script will inherit the proper taskids from the shepherd.


Projects

Note: The SunTM ONE Grid Engine, Enterprise Edition software has projects, but for this document, only the Solaris 9 Resource Manager software concept of projects will be used.

newtask can be used to create a new taskid as well as  bill the job to a project. To do this, use the -p <<project name>> flag on the newtask command line inside the starter method or prolog script.

You can have resource pools associated with a project; therefore, billing the job to a project will also bind it to the associated resource pool.


Resource Pools

There are two ways of using resource pools with jobs: (1) associate a resource pool to a project and use this project on the newtask call. (2) use the poolbind command and explicitly bind the job to a resource pool. The poolbind command requires root privileges and should be used on the queue's prolog or be called from the starter method through a setuid wrapper.

The prolog example on Appendix B, uses the poolbind method. Resource pools can be used to limit the number of CPUs a job can use; therefore, a grid administrator can block a multithreaded program to use more CPUs than requested. Note that if a parallel job is bound to a resource pool with less CPUs than given by the PARALLEL environment variable, the parallel job will severely slow down.


Resource controls

At this point resource controls do not offer much new functionality to the Grid Engine software.  However, you can use resource controls to enforce CPU time limits and Light Weight Process (LWP) limits on a job, by using the prctl command on the prolog or start method to set the job's resource controls.  The prolog in Appendix B illustrates the usage of prctl.


How do I get an exacct report?

There are no reporting tools for exacct at this time. Only the C API and a demo program that prints all the exacct records from a file is available. The demo program in C is in the SUNWosdem package and is usually installed at /usr/demo/libexacct. You will need to compile it before you use it.

I wrote a simple tool that scans the exacct file that keeps track of processes and prints selected parts of the process records that used more than 0.5 seconds of CPU time. You can download this tool here.


Limitations

The setup presented here only works for BATCH queues.  

Interactive loads have to be controlled by projects because the in.rlogind starts a new task for the login session, ignoring the task created by the starter method or prolog.

Parallel Environments are not covered in this HOWTO.


Appendix: A

Minimal Prolog

The following is a minimal prolog script to assign a taskid to a job;  You can download it here.



=== Minimal prolog ===
#!/bin/sh
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
# Copyright © 2002 Sun Microsystems, Inc.  All rights reserved.
#****** s9rm_prolog_minimal.sh *********************************************
#
#  NAME
#     s9rm_prolog_minimal.sh -- prolog to associate a job with a Solaris 9
#     Resource Manager task.
#
#  SYNOPSIS
#     s9rm_prolog_minimal.sh
#
#  FUNCTION
#     This script can be used as prolog in sge_queue(5), to create
#     a new taskid for the job.
#
#  NOTES
#     The /usr/bin/newtask command is not available before Solaris 8.
#
#***************************************************************************
if [ ! -x /usr/bin/newtask ]
then
        echo "Warning: /usr/bin/newtask is not available, skipping Solaris 9 Resource Manager setup."
        exit 0
fi
####
####  The line below creates a new task for this job.
####
/usr/bin/newtask -c `/bin/ps -o ppid= -p $$`
####
exit 0
#### ### ### End of the prolog ### ### ###

=== Minimal prolog ===


Appendix: B

Fancier Prolog

Here is an example of a fancier prolog script.  It creates a new taskid, assigns it to the job, sets resource controls, and binds the job to a resource pool. You can download it here.

== fancy prolog ===

#!/bin/sh
# Copyright © 2002 Sun Microsystems, Inc.  All rights reserved.
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
#****** util/resources/s9rm_prolog.sh ***************************************
#
#  NAME
#     s9rm_prolog.sh -- Prolog to set up Solaris 9 Resource Manager
#       resource control and accounting.
#
#  SYNOPSIS
#     s9rm_prolog.sh
#
#  FUNCTION
#     This script can be used as prolog in sge_queue(5). It creates
#     a new taskid for the job, sets the user's default project as projid,
#     sets resource controls, and binds the job to a resource pool.
#     If some of this functionality is not desired, comment out the
#     respective commands.
#
#  NOTES
#     The /usr/bin/newtask command is not available before Solaris 8.
#
#***************************************************************************
if [ ! -x /usr/bin/newtask ]
then
        echo "Warning: /usr/bin/newtask is not available, skipping Solaris 9 res
ource Manager setup."
        exit 0
fi
####  Get the shepherd's process id
SHEP_PID="`/bin/ptree $$ | awk 'BEGIN {getline; getline; print $1}'`"
####
##################################################################
####
####  Setting the jobs' task and project ids
####
##################################################################
####
####  The lines below get the user's default project name and id
####  You might want a fancier mapping of jobs/queues to projects
####
PROLOG_DEFAULT_PROJECT="`/bin/projects -d ${USER}`"
PROLOG_PROJECT_ID="`grep $PROLOG_DEFAULT_PROJECT /etc/project| /usr/bin/awk -F:
'{print $2}'`"
####
####  The line below creates a new task for this job and assigns it to the
####  user's default project.
####
/usr/bin/newtask -p $PROLOG_DEFAULT_PROJECT -c $SHEP_PID
PROLOG_JOB_TASKID="`/bin/ps -o taskid= -p $SHEP_PID`"
####
##################################################################
####
####  Binding the job to a resource pool
####
##################################################################
####
PROLOG_POOL=single
PROLOG_MYTASK="`/bin/ps -o taskid= -p $$`"
/usr/sbin/poolbind -p $PROLOG_POOL -i taskid $PROLOG_JOB_TASKID
##################################################################
####
####  Setting resource controls
####
##################################################################
####
/usr/bin/prctl -n task.max-lwps -v 9 -e signal=9 -i task $PROLOG_JOB_TASKID
####
exit 0
####
#### ### ### End of the prolog ### ### ##

== fancy prolog ==

To use this prolog in a queue, use the following command:

qconf -mqattr prolog root@<<path to the prolog script>> <<queues>>


Appendix: C

Here are the commands that will create three resource pools in a 4 CPU machine. One pool is allocated to system resources and the other two ("single",  with 1 CPU and "dual" with 2 CPUs) can be used by SGE prologs. You can download this file here.

create system mymachine

create pset sys-procs (string pset.comment = "System Pset"; string pool.scheduler = "TS" ;uint pset.min = 1; uint pset.max=1)
create pool sys-procs (string pool.comment = "System resource pool")
associate pool sys-procs (pset sys-procs)
create pset single (string pset.comment = "SGE Pset"; string pool.scheduler = "TS" ;uint pset.min = 1; uint pset.max=1)
create pool single (string pool.comment = "SGE Resource Pool")
associate pool single (pset single)
create pset dual (string pset.comment = "SGE Pset"; string pool.scheduler = "TS" ;uint pset.min = 2; uint pset.max=2)
create pool dual (string pool.comment = "SGE Resource Pool")
associate pool dual (pset dual)

Appendix: D

Sample epilog

Below is an epilog script that runs the simple exacct report generator available here.  The epilog script can be downloaded here.

#!/bin/sh
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
# Copyright © 2002 Sun Microsystems, Inc.  All rights reserved.
#****** util/resources/s9rm_epilog.sh ***************************************
#
#  NAME
#     s9rm_epilog.sh -- epilog to generate accounting reports at
#                       the end of a job
#
#  SYNOPSIS
#     s9rm_epilog.sh
#
#  FUNCTION
#     This script can be used as epilog in sge_queue(5); It runs a
#     a program to generate a summary report from the job's exacct records.
#
#  NOTES
#     The /usr/bin/newtask command is not available before Solaris 8.
#     Please set the EPILOG_SUMMARY variable to the path of report generator
#     generator program before using this script.
#
#***************************************************************************
####
#### Name the program that generates a summary of the
#### job's exacct records
####
EPILOG_SUMMARY=<<path to report generator>>/proclist
####
####
EPILOG_MYTASK="`/bin/ps -o taskid= -p $$`"
#### You might want to call newtask again here, to finish the previous task,
#### otherwise the exacct task record will not be available at this point
if [ -x $EPILOG_SUMMARY ]
then
        $EPILOG_SUMMARY
fi
####



To use this prolog in a queue, edit the variable EPILOG_SUMMARY to point to the proper path of proclist then use the following command:

qconf -mqattr epilog <<path to the prolog script>> <<queues>>




Appendix: E

Sample Starter Method


Simple starter_method, similar to the minimal prolog (available for download here):

#!/bin/sh
# (c) 2002 Sun Microsystems, Inc. Use is subject to license terms.
# Copyright © 2002 Sun Microsystems, Inc.  All rights reserved.
#****** s9rm_starter_method.sh *********************************************

#
#  NAME
#     s9rm_starter_method.sh -- prolog to associate a job with a Solaris 9
#     Resource Manager task.
#
#  SYNOPSIS
#     s9rm_starter_method.sh
#
#  FUNCTION
#     This script can be used as starter_method in sge_queue(5) to create
#     a new taskid for the job.
#
#  NOTES
#     The /usr/bin/newtask command is not available before Solaris 8.
#
#***************************************************************************
if [ -x /usr/bin/newtask ]
then
   SM_DEFAULT_PROJECT="`/bin/projects -d ${USER}`"
   exec /usr/bin/newtask -p $SM_DEFAULT_PROJECT $SGE_STARTER_SHELL_PATH $*
else
   echo "Warning: /usr/bin/newtask is not available, skipping Solaris 9 Resource Manager setup."
   exec $SGE_STARTER_SHELL_PATH $*
fi
#### ### ### End of the starter_method ### ### ###

To use this starter_method  in a queueuse the following command:

qconf -mqattr starter_method <<path to the starter_method script script>> <<queues>>

Trademarks


Copyright © 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.This distribution may include materials developed by third parties.   Sun, Sun Microsystems, SunTM ONE and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.  


By Paulo Tibério Muradas Bulhões, November 2002.