The qmon(1) graphical user interface can be used to perform all administrative tasks in Grid Engine, and its usage provides a powerful learning tool for all the capabilities of Grid Engine. However, Grid Engine can also be completely administered through commands issued at the shell prompt and called from within shell scripts. Experienced administrators will find this to be a more flexible, quicker, and powerful way to change Grid Engine settings.
This HOWTO contains an overview and examples of shell-based adminstration. In addition, it contains additional techniques and constructions which can be used to enable more sophisticated tasks, such as wrapper scripts. For more basic level configuration commands, please see the HOWTO entitled "Common Administrative Tasks".
The qconf command can be used to add new objects or modify existing objects from the specification in a file. The syntax is
qconf -{A,M}<object> <filename>
Where -A means add, and -M means modify.
<object> can be:
c: complex
ckpt: checkpoint environment
e: execution host
p: parallel environment
q: queue
u: user set
This option can be used in combination with the "show" option of qconf (qconf -s<obj>) to take an existing object, modify it, and then update the existing object or create a new one.
#!/bin/sh # ckptq.sh: specify queues of a checkpoint from a list in a file # Usage: ckptq.sh <checkpoint-env-name> <filename> # <filename> contains a list of queues, # separated by commas and/or newlines TMPFILE=`mktemp` CKPT=$1 QUEUELIST=$2 qconf -sckpt $CKPT | grep -v 'queue_list' > $TMPFILE echo queue_list `tr "\012" " " < $QUEUELIST | tr "," " "` >> $TMPFILE qconf -Mckpt $TMPFILE rm $TMPFILE
Individual queues, hosts, and both parallel and checkpointing enviroments can be modified from the command line by using the qconf -M{q,e,p,ckpt} <filename> command as shown above, or by using the qconf -m{q,e,p,ckpt} <objname> command. This opens a temporary file in an editor, and when you save any changes you make to this file and exit the editor, the system immediately reflects those changes. However, when you want to change many objects at once, and to change object configuration non-interactively, the qconf -...attr set of commands are used.
The first type of commands makes modifications according to the specification on the command line.
qconf -{a,m,r,d}attr queue|exechost|pe|ckpt|hostgroup|resource_quota <attrib> <value> <queue_list>|<host_list>
while the second makes modifications according to specifications in a file:
qconf -{A,M,R,D}attr queue|exechost|pe|ckpt|hostgroup|resource_quota <filename>
In both sets of commands, the options indicate the following:
-A/a: add attribute
-M/m: modify attribute
-R/r: replace attribute
-D/d: delete attribute
<attrib>: queue or host attribute to be changed
<value>: value of attribute to be affected
<filename>: a file containing attribute-value pairs
a, m, d allow you to operate on individual values in a list of values, while r will replace the entire list of values with the new one which is specified, either on the command line or in the file.
Change the queue type of "tcf27-e019.q" to batch-only
% qconf -rattr queue qtype batch tcf27-e019.q
Modify the queue type and shell start behavior of tcf27-e019.q based on the contents of the file "new.cfg":
% cat new.cfg
qtype batch interactive
checkpointing
shell_start_mode unix_behavior
% qconf -Rattr
queue new.cfg tcf27-e019.q
Attach the complexes named "storage" and "license" to the host "tcf27-e019"
% qconf -rattr exechost complex_list
storage,license tcf27-e019
Add the resource named "scratch1" with a value of 1000M and "long" with a value of 2
% qconf -rattr exechost complex_values
scratch1=1000M,long=2 tcf27-e019
Attach the resource named "short" to the host with a value of 4
% qconf -aattr exechost complex_values short=4
tcf27-e019
Change the value of "scratch1" to 500M while leaving other values untouched
% qconf -mattr exechost complex_values
scratch1=500M tcf27-e019
Delete the resource "long"
% qconf -dattr exechost complex_values long
tcf27-e019
Add "tcf27-b011.q" to the list of queues for checkpointing enviroment "sph"
% qconf -aattr ckpt queue_list tcf27-b011.q sph
Change the number of slots in parallel environment "make" to 50
% qconf -mattr pe slots 50 make
See also the qconf_scripts below.
The qselect command outputs a list of queues. If called with options, it lists only queues which match the given specifications. This can be used to great advantage in combination with the qconf -...attr queue commands to target specific queues to modify.
all queues on Linux machines
% qselect -l arch=glinux
all queues on machines with 2 CPUs
% qselect -l num_proc=2
all queues on all 4 CPU 64-bit Solaris machines
% qselect -l arch=solaris64,num_proc=4
queues that provide an application license (previously configured)
% qselect -l app_lic=TRUE
You can combine qselect with qconf to do wide-reaching changes with a single command line. To do this, simply put the entire qselect command within backticks, and use it in place of the <queue_list> on the qconf command line.
Set the prolog script to sol_prolog.sh on all queues on Solaris machines
% qconf -mattr queue prolog
/usr/local/scripts/sol_prolog.sh `qselect -l arch=solaris`
set the attribute "fluent_license" to 2 on all queues on two-processor systems
% qconf -mattr queue complex_values
fluent_license=2 `qselect -l num_proc=2`
The use of qconf in conjunction with qselect provides the most flexible way to automate the configuration of Grid Engine queues, allowing you to build up your own custom administration scripts.
For an example of generating a list of hosts on which to operate, see the qselect-node-list script.
Another way to select hosts and queues, which may be more convenient, particularly for selecting hosts, is qconf -sobjl, e.g. to select 64-core hosts:
% qconf -sobjl exechost load_values '*num_proc=64*'
To modify the scheduler or global configuration, the qconf -m... command is used, as qconf -mconf to change the global configuration and qconf -msconf for the scheduler. Both of these commands open up a temporary file in an editor. When you exit the editor, any changes you have made to this temporary file are processed by the system and take effect immediately. The editor program used to open the temporary file is the one specified by the EDITOR enviroment variable. If this variable is undefined, then vi is used.
You can take advantage of the EDITOR environment variable to automate the behavior of the qconf -m... commands. Change the value of this variable to point to a program which modifies the file whose name is given by the first argument. After this program modifies the temporary file and exits, the system will read in the modifications and update immediately. NOTE: if the modification time of the file does not change after the edit operation, the system will sometimes incorrectly assume it has not been modified. Therefore, there should be a "sleep 1" inserted before writing the file, to ensure a different modification time.
#!/bin/sh # sched_int.sh: modify the schedule interval # usage: sched_int.sh <n>, where <n> is # the new interval, in seconds. n < 60 TMPFILE=`mktemp` if [ $MOD_SGE_SCHED_INT ]; then grep -v schedule_interval $1 > $TMPFILE echo "schedule_interval 0:0:$MOD_SGE_SCHED_INT" >> $TMPFILE # sleep to ensure modification time changes sleep 1 mv $TMPFILE $1 else EDITOR=$0 export MOD_SGE_SCHED_INT=$1 qconf -msconf fi
The sample script above modifies the EDITOR environment to point to itself, and then calls qconf -msconf. This second nested invocation of the script then modifies the temporary file specified by the first argument, and then exits. The Grid Engine system then automatically reads in the changes and the first invocation of the script terminates. The above technique can be used in conjunction with any qconf -m... command. However, it is especially useful for administration of the scheduler and global configuration, since there is no other way to automate this.
A collection of scripts providing qconf -aattr-like interfaces is available.