sge_pe.5




NAME

       sge_pe - Grid Engine parallel environment configuration file format


DESCRIPTION

       Parallel environments are parallel programming and runtime environments
       supporting the execution of shared memory or distributed memory
       parallelized applications. Parallel environments usually require some
       kind of setup to be operational before starting parallel applications.
       Examples of common parallel environments are OpenMP on shared memory
       multiprocessor systems, and Message Passing Interface (MPI) on shared
       memory or distributed systems.

       sge_pe allows for the definition of interfaces to arbitrary parallel
       environments.  Once a parallel environment is defined or modified with
       the -ap or -mp options to qconf(1) and linked with one or more queues
       via pe_list in queue_conf(5) the environment can be requested for a job
       via the -pe switch to qsub(1) together with a request for a numeric
       range of parallel processes to be allocated by the job. Additional -l
       options may be used to specify more detailed job requirements.

       Note, Grid Engine allows backslashes (\) be used to escape newline
       characters. The backslash and the newline are replaced with a space
       character before any interpretation.


FORMAT

       The format of a sge_pe file is defined as follows:

   pe_name
       The name of the parallel environment in the format for pe_name in
       sge_types(1).  To be used in the qsub(1) -pe switch.

   slots
       The total number of slots (normally one per parallel process or thread)
       allowed to be filled concurrently under the parallel environment.  Type
       is integer, valid values are 0 to 9999999.

   user_lists
   xuser_lists
       A comma-separated list of user access list names (see access_list(5)).

       Each user contained in at least one of the user_lists access lists has
       access to the parallel environment. If the user_lists parameter is set
       to NONE (the default) any user has access if not explicitly excluded
       via the xuser_lists parameter.

       Each user contained in at least one of the xuser_lists access lists is
       not allowed to access the parallel environment. If the xuser_lists
       parameter is set to NONE (the default) any user has access.

       If a user is contained both in an access list in xuser_lists and
       user_lists the user is denied access to the parallel environment.

   start_proc_args
   stop_proc_args
       The command line respectively of a startup or shutdown procedure (an
       executable command, plus possible arguments) for the parallel
       environment, or "none" for no procedure (typically for tightly
       integrated PEs).  The command line is started directly, not in a shell.
       An optional prefix "user@" specifies the username under which the
       procedure is to be started.  In that case see the SECURITY section
       below concerning security issues running as a privileged user.

       The startup procedure is invoked by sge_shepherd(8) on the master node
       of the job prior to executing the job script. Its purpose is to setup
       the parallel environment according to its needs.  The shutdown
       procedure is invoked by sge_shepherd(8) after the job script has
       finished. Its purpose is to stop the parallel environment and to remove
       it from all participating systems.  The standard output of the
       procedure is redirected to the file REQUEST.poJID in the job's working
       directory (see qsub(1)), with REQUEST being the name of the job as
       displayed by qstat(1), and JID being the job's identification number.
       Likewise, the standard error output is redirected to REQUEST.peJID.  If
       the -e or -o options are given on job submission, the PE error and
       standard output is merged into the paths specified.

       The following special variables, expanded at runtime, can be used
       (besides any other strings which have to be interpreted by the start
       and stop procedures) to constitute a command line:

       $pe_hostfile
              The pathname of a file containing a detailed description of the
              layout of the parallel environment to be setup by the start-up
              procedure. Each line of the file refers to a host on which
              parallel processes are to be run. The first entry of each line
              denotes the hostname, the second entry the number of parallel
              processes to be run on the host, the third entry the name of the
              queue.  The entries are separated by spaces.  If -binding pe is
              specified on job submission, the fourth column is the core
              binding specification as colon-separated socket-core pairs, like
              "0,0:0,1", meaning the first core on the first socket and the
              second core on the first socket can be used for binding.
              Otherwise it will be "UNDEFINED".  With the obsolete queue
              processors specification the fourth entry could be a multi-
              processor configuration (or "<NULL>").

       $host  The name of the host on which the startup or stop procedures are
              run.

       $ja_task_id
              The array job task index (0 if not an array job).

       $job_owner
              The user name of the job owner.

       $job_id
              Grid Engine's unique job identification number.

       $job_name
              The name of the job.

       $pe    The name of the parallel environment in use.

       $pe_slots
              Number of slots granted for the job.

       $processors
              The processors string as contained in the queue configuration
              (see queue_conf(5)) of the master queue (the queue in which the
              startup and stop procedures are run).

       $queue The cluster queue of the master queue instance.

       $sge_cell
              The SGE_CELL environment variable (useful for locating files).

       $sge_root
              The SGE_ROOT environment variable (useful for locating files).

       $stdin_path
              The standard input path.

       $stderr_path
              The standard error path.

       $stdout_path
              The standard output path.

       $merge_stderr

       $fs_stdin_host

       $fs_stdin_path

       $fs_stdin_tmp_path

       $fs_stdin_file_staging

       $fs_stdout_host

       $fs_stdout_path

       $fs_stdout_tmp_path

       $fs_stdout_file_staging

       $fs_stderr_host

       $fs_stderr_path

       $fs_stderr_tmp_path

       $fs_stderr_file_staging

       The start and stop commands are run with the same environment setting
       as that of the job to be started afterwards (see qsub(1)).

   allocation_rule
       The allocation rule is interpreted by the scheduler thread and helps
       the scheduler to decide how to distribute parallel processes among the
       available machines. If, for instance, a parallel environment is built
       for shared memory applications only, all parallel processes have to be
       assigned to a single machine, no matter how many suitable machines are
       available.  If, however, the parallel environment follows the
       distributed memory paradigm, an even distribution of processes among
       machines may be favorable, as may packing processes onto the minimum
       number of machines.

       The current version of the scheduler only understands the following
       allocation rules:

       int    An integer, fixing the number of processes per host. If it is 1,
              all processes have to reside on different hosts. If the special
              name $pe_slots is used, the full range of processes as specified
              with the qsub(1) -pe switch has to be allocated on a single host
              (no matter what value belonging to the range is finally chosen
              for the job to be allocated).

       $fill_up
              Starting from the best suitable host/queue, all available slots
              are allocated. Further hosts and queues are "filled up" as long
              as a job still requires slots for parallel tasks.

       $round_robin
              From all suitable hosts, a single slot is allocated until all
              tasks requested by the parallel job are dispatched. If more
              tasks are requested than suitable hosts are found, allocation
              starts again from the first host.  The allocation scheme walks
              through suitable hosts in a most-suitable-first order.

   control_slaves
       This parameter can be set to TRUE or FALSE (the default). It indicates
       whether Grid Engine is the creator of the slave tasks of a parallel
       application via sge_execd(8) and sge_shepherd(8) and thus has full
       control over all processes in a parallel application  ("tight
       integration").  This enables:

       o      resource limits are enforced for all tasks, even on slave hosts;

       o      resource consumption is properly accounted on all hosts;

       o      proper control of tasks, with no need to write a customized
              terminate method to ensure that whole job is finished on qdel
              and that tasks are properly reaped in the case of abnormal job
              termination;

       o      all tasks are started with the appropriate nice value which was
              configured as priority in the queue configuration;

       o      propagation of the job environment to slave hosts, e.g. so that
              they write into the appropriate per-job temporary directory
              specified by TMPDIR, which is created on each host and properly
              cleaned up.

       To gain control over the slave tasks of a parallel application, a
       sophisticated PE interface is required, which works closely together
       with Grid Engine facilities, typically interpreting the Grid Engine
       hostfile and starting remote tasks with qrsh(1) and its -inherit
       option.  See, for instance, the $SGE_ROOT/mpi directory and the howto
       pages <http://arc.liv.ac.uk/SGE/howto/
       #Tight%20Integration%20of%20Parallel%20Libraries>.

       Please  set  the  control_slaves  parameter  to  false for all other PE
       interfaces.

   job_is_first_task
       The job_is_first_task parameter can be set to TRUE or FALSE. A value of
       TRUE  indicates that the Grid Engine job script already contains one of
       the tasks of the parallel application (and the number of slots reserved
       for  the  job  is  the  number of slots requested with the -pe switch).
       FALSE indicates that the job script (and its child  processes)  is  not
       part  of  the  parallel  program, just being used to kick off the tasks
       that do the work; then the number of slots reserved for the job in  the
       master queue is increased by 1, as indicated by qstat/qhost.

       This  should  be  TRUE  for  the common modern MPI implementations with
       tight integration.  Consider if the allocation rule is $fill_up, and  a
       job is allocated only a single slot on the master host; then one of the
       MPI processes actually runs in that slot, and should  be  accounted  as
       such, so the job is the first task.

       If wallclock accounting is used (execd_params ACCT_RESERVED_USAGE
        and/or  SHARETREE_RESERVED_USAGE Is TRUE) and control_slaves is set to
       FALSE, the job_is_first_task parameter influences  the  accounting  for
       the  job:  A  value of TRUE means that accounting for CPU and requested
       memory gets multiplied by the number of slots requested  with  the  -pe
       switch.   FALSE  means  the  accounting  information gets multiplied by
       number of slots + 1.  Otherwise, the only  significant  effect  of  the
       parameter is on the display of the job.

   urgency_slots
       For  pending  jobs  with a slot range PE request with different minimum
       and maximum, the  number  of  slots  they  will  actually  use  is  not
       determined. This setting specifies the method to be used by Grid Engine
       to assess the number of slots such jobs might finally get.

       The  assumed  slot  allocation  has  a  meaning  when  determining  the
       resource-request-based  priority  contribution for numeric resources as
       described in sge_priority(5) and is  displayed  when  qstat(1)  is  run
       without -g t option.

       The following methods are supported:

       int    The  specified  integer  number  is directly used as prospective
              slot amount.

       min    The slot range minimum is used as prospective slot amount. If no
              lower bound is specified with the range, 1 is assumed.

       max    The  slot  range maximum is used as prospective slot amount.  If
              no upper bound is specified with the range, the absolute maximum
              possible due to the PE's slots setting is assumed.

       avg    The  average  of all numbers occurring within the job's PE range
              request is assumed.

   accounting_summary
       This parameter is only checked if control_slaves (see above) is set  to
       TRUE  and  thus  Grid  Engine  is  the  creator of the slave tasks of a
       parallel application via sge_execd(8)  and  sge_shepherd(8).   In  this
       case,  accounting  information is available for every single slave task
       started by Grid Engine.

       The accounting_summary parameter can be set to TRUE or FALSE.  A  value
       of  TRUE  indicates  that only a single accounting record is written to
       the accounting(5) file, containing the accounting summary of the  whole
       job,  including  all  slave  tasks, while a value of FALSE indicates an
       individual accounting(5) record is written for  every  slave  task,  as
       well as for the master task.

       Note:     When     running     tightly     integrated     jobs     with
       SHARETREE_RESERVED_USAGE set, and  accounting_summary  enabled  in  the
       parallel  environment,  reserved  usage  will  only  be reported by the
       master task of the parallel job.  No per-parallel  task  usage  records
       will be sent from execd to qmaster, which can significantly reduce load
       on the qmaster when running large, tightly  integrated  parallel  jobs.
       However, this removes the only post-hoc information about which hosts a
       job used.

   qsort_args library qsort-function [arg1 ...]
       Specifies a method for  specifying  the  queues/hosts  and  order  that
       should  be  used to schedule a parallel job.  For details, and the API,
       consult the header file  $SGE_ROOT/include/sge_pqs_api.h.   library  is
       the  path  to  the qsort dynamic library, qsort-function is the name of
       the qsort function  implemented  by  the  library,  and  the  args  are
       arguments  passed  to  qsort.   Substitutions  from  the hard requested
       resource list for the  job  are  made  for  any  strings  of  the  form
       $resource,  where  resource is the full name of the resource as defined
       in the complex(5) list.  If resource is not requested  in  the  job,  a
       null string is substituted.


RESTRICTIONS

       Note  that  the  functionality of the start and stop procedures remains
       the full responsibility of the administrator configuring  the  parallel
       environment.   Grid  Engine  will  invoke these procedures and evaluate
       their exit status.  A non-zero exit status will put the queue  into  an
       error  state.   If  the start procedure has a non-zero exit status, the
       job will be re-queued.


SECURITY

       If start_proc_args, or stop_proc_args is specified with a user@ prefix,
       the  same  considerations  apply  as  for  the  prolog  and  epilog, as
       described in the SECURITY section of sge_conf(5).


SEE ALSO

       sge_intro(1),  sge__types(1),  qconf(1),  qdel(1),  qmod(1),   qrsh(1),
       qsub(1), access_list(5), sge_conf(5), sge_qmaster(8), sge_shepherd(8).


FILES

       $SGE_ROOT/include/sge_pqs_api.h


COPYRIGHT

       See sge_intro(1) for a full statement of rights and permissions.



SGE 8.1.3pre                      2012-09-11                         SGE_PE(5)

Man(1) output converted with man2html