Rationale
Consider using array jobs whenever you might otherwise submit a large number of jobs — likely auto-generated — to do the same processing on a lot of individual input datasets or values. Examples include image processing or rendering with one job for each frame in a sequence, and parameter sweeps, with each job running the same calculation with a different parameter set from a pre-defined collection.
Array jobs effectively constitute a parallel loop over the input data
which can be manipulated as a unit by qsub
, qdel
, qhold
, qalter
, etc.
They have a single entry in qstat
output when waiting (but an entry
for each running member of the array). This is easier on you than
generating and manipulating a large number of individual jobs, and
easier on the system, which doesn’t have to manage a very large number
of jobs.
Each instance of the job in the effective loop is called a task,
which has an index exported to the task’s environment for use in the
job script. The tasks may be parallel. Each has initially the same
SGE parameters, though qalter
may be used to change some for waiting
tasks.
See qsub(1) for
details of array job options -t
, -tc
, -hold_jid
, and
-hold_jid_ad
in addition to the guidance below.
Basic Use
The job script for an array job is usually essentially a template in
which substitutions can be made on the basis of the task index
environment variable $SGE_TASK_ID
.
Consider processing the contents of a collection of 1000 directories
with regular names of the form case
n, with n running from
1–1000. Submitting the job as
qsub -t 1-1000 ... array.sh
asks for array.sh
to be run as 1000 separate tasks and it might
contain something like:
#!/bin/sh #$ -cwd -j y -o case$TASK_ID/output cd case$SGE_TASK_ID exec program <input
This is equivalent to 1000 jobs, each executing commands from an element of the sequence
cd case1; program <input ... cd case1000; program <input
so processing data in case1
, … case1000
and
putting the job output in output
in that directory. See
qsub(1)
for details of substitutions in the values specified with
-e
and -o
; note the lack of an SGE_
prefix in those contexts.
Tasks are started in order of the array index, but you can’t make more assumptions than that about when they execute; a task could be rescheduled to start effectively after a later one by array index.
Note that the
sge_conf(5)
parameters max_aj_tasks
and max_aj_instances
control for each job
the maximum total number of tasks and the maximum number of concurrent
tasks.
Refinements and Clichés
Indexing Arithmetic
The task index is always 1-based, but you can always do arithmetic
with it if your data are essentially 0-based. In a POSIX-conformant
shell like bash(1)
, the expression $(($SGE_TASK_ID-1))
converts the
index to a 0-based one.
The array stride need not be 1. This is useful, for instance, to unroll the loop when the tasks would be short compared with overheads of running them:
$ qsub -t 1-1000:5 ... <<EOF for i in $(seq $SGE_TASK_STEPSIZE); do index=$(($SGE_TASK_ID+$i)) program < input$index done EOF
Variables SGE_TASK_FIRST
, SGE_TASK_LAST
, and SGE_TASK_STEPSIZE
provide the task range and stride to the script.
Non-shell script Jobs
It isn’t necessary to use a shell script — the index is accessible in
the environment of a binary job, though it would probably need to be
an SGE-specific program, and a job script could be in a non-shell
language such as Python, accessing the environment in the
appropriate way. Here is a trivial example of constructing a file
name in Python in a two-task array job, assuming the cluster
configuration sets shell_start_mode
to unix_behavior
:
#!/usr/bin/python2 #$ -t 1-2 -l h_rt=9 -cwd -j y import os ip = "input"+os.getenv("SGE_TASK_ID")+".dat" print "We might read from file", ip
and similarly for R, using
littler, which only
need be on PATH if we use the /bin/env
cliché:
#!/bin/env r
#$ -t 1-2 -l h_rt=9 -cwd -j y
ip <- sprintf ("input%s.dat", Sys.getenv("SGE_TASK_ID"))
cat ("We might read from file", ip, "\n")
The -C
argument of qsub
might help if #
isn’t a comment in
the script language.
Selecting from Lists in Files
You needn’t be restricted to simple template-like jobs. You can put
complex arguments or commands in a file and pull out a different line
each time, e.g. with the right number of commands to be executed
listed one/line in file
. awk
is convenient for this,
e.g. selecting a complete command line:
eval $(awk NR==$SGE_TASK_ID lines)
or an input file from a non-regular collection:
program < $(awk NR==$SGE_TASK_ID lines)
An alternative suggestion involves supplying a list of scripts as job arguments.
Dependent Arrays
qsub -hold_jid_ad
can be used to hold an array job’s tasks dependent
on the corresponding tasks in others. E.g. with
qsub -t 1:50 step1 qsub -t 1:50 -hold_jid_ad step1 step2 qsub -t 1:50:2 -hold_jid_ad step2 step3
array task n of step2
will only run after task n of step1
has
finished, and task n of step3
will only run after tasks n and
n+1 of step2
has finished. Similarly, with
qsub -t 1:50:2 step1 qsub -t 1:50 -hold_jid_ad step1 step2
task n and n+1 of step2
depend on task n of step1
, where n
is odd. NB ‘finished’ doesn’t mean ‘finished successfully’, so
subsequent tasks may need to check the results of previous ones.
In contrast, with
qsub -t 1:50 step1 qsub -t 1:50 -hold_jid step1 step2
tasks of step2
will only run after all those of step1
have finished.
Task Concurrency
The qsub
-tc
option can be used to restrict the number of tasks of
the job that run concurrently, usually to be socially conscious in not
dominating the cluster. You might use -tc 1
to ensure only one task
can run at once, e.g. for a series of tests that update some state in
the directory that you don’t care about but when it will cause
problems if more than one task does so. This might also be useful if
each task was dependent on results from the previous index, though
care would have to be taken in case a task failed.
Copyright © 2013, 2014 Dave Love, University of Liverpool; licence GFDL.