sge_conf.5
NAME
sge_conf - Grid Engine configuration files
DESCRIPTION
sge_conf defines the global and local Grid Engine configurations and
can be shown/modified by qconf(1) using the -sconf/-mconf options. Only
root or the cluster administrator may modify sge_conf.
At its initial start-up, sge_qmaster(8) checks to see if a valid Grid
Engine configuration is available at a well known location in the Grid
Engine internal directory hierarchy. If so, it loads that
configuration information and proceeds. If not, sge_qmaster(8) writes
a generic configuration containing default values to that same
location. The Grid Engine execution daemons sge_execd(8) upon start-up
retrieve their configuration from sge_qmaster(8).
The actual configuration for both sge_qmaster(8) and sge_execd(8) is a
superposition of a global configuration and a local configuration
pertinent for the host on which a master or execution daemon resides.
If a local configuration is available, its entries overwrite the
corresponding entries of the global configuration. Note: The local
configuration does not have to contain all valid configuration entries,
but only those which need to be modified against the global entries.
Note: Grid Engine allows backslashes (\) be used to escape newline
characters. The backslash and the newline are replaced with a space ("
") character before any interpretation.
FORMAT
The paragraphs that follow provide brief descriptions of the individual
parameters that compose the global and local configurations for a Grid
Engine cluster:
execd_spool_dir
The execution daemon spool directory path. Again, a feasible spool
directory requires read/write access permission for root. The entry in
the global configuration for this parameter can be overwritten by
execution host local configurations, i.e. each sge_execd(8) may have a
private spool directory with a different path, in which case it needs
to provide read/write permission for the root account of the
corresponding execution host only.
Under execd_spool_dir a directory named corresponding to the
unqualified hostname of the execution host is opened and contains all
information spooled to disk. Thus, it is possible for the
execd_spool_dirs of all execution hosts to physically reference the
same directory path (the root access restrictions mentioned above need
to be met, however).
Changing the global execd_spool_dir parameter set at installation time
is not supported in a running system. If the change should still be
done it is required to restart all affected execution daemons. Please
make sure running jobs have finished before doing so, otherwise running
jobs will be lost.
The default location for the execution daemon spool directory is
$SGE_ROOT/$SGE_CELL/spool.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
mailer
mailer is the absolute pathname to the electronic mail delivery agent
on your system. An optional prefix "user@" specifies the user under
which this procedure is to be started; the default is root. The mailer
must accept the following syntax:
mailer -s subject-of-mail-message recipient
Each sge_execd(8) may use a private mail agent. Changing mailer will
take immediate effect.
The default for mailer depends on the operating system of the host on
which the Grid Engine master installation was run. Common values are
/bin/mail or /usr/bin/Mail. Note that since the mail is sent by
compute hosts, not the master, it may be necessary to take steps to
route it appropriately, e.g. by using a cluster head node as a "smart
host" for the private network.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
xterm
xterm is the absolute pathname to the X Window System terminal
emulator, xterm(1).
Changing xterm will take immediate effect.
The default for xterm is system-dependent.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
load_sensor
A comma-separated list of executable shell script paths or programs to
be started by sge_execd(8) and to be used in order to retrieve site-
configurable load information (e.g. free space on a certain disk
partition).
Each sge_execd(8) may use a set of private load_sensor programs or
scripts. Changing load_sensor will take effect after two load report
intervals (see load_report_time). A load sensor will be restarted
automatically if the file modification time of the load sensor
executable changes.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
In addition to the load sensors configured via load_sensor, sge_exec(8)
searches for an executable file named qloadsensor in the execution
host's Grid Engine binary directory path. If such a file is found, it
is treated like the configurable load sensors defined in load_sensor.
This facility is intended for pre-installing a default load sensor.
See sge_execd(8) for information on writing load sensors.
prolog
The path of an executable, with optional arguments, that is started
before execution of Grid Engine jobs with the same environment setting
as that for the Grid Engine jobs to be started afterwards (see
qsub(1)). The prolog command is started directly, not in a shell. An
optional prefix "user@" specifies the user under which this procedure
is to be started. In that case see the SECURITY section below
concerning security issues running as a privileged user. The
procedure's standard output and the error output stream are written to
the same file as used for the standard output and error output of each
job.
This procedure is intended as a means for the Grid Engine administrator
to automate the execution of general site-specific tasks, like the
preparation of temporary file systems, with a need for the same context
information as the job. For a parallel job, only a single instance of
the prolog is run, on the master node. Each sge_execd(8) may use a
private prolog. Correspondingly, the global or execution host local
configuration can be overwritten by the queue configuration (see
queue_conf(5)). Changing prolog will take immediate effect.
The default for prolog is the special value NONE, which prevents
execution of a prolog.
The following special variables, expanded at runtime, can be used
(besides any other strings which have to be interpreted by the
procedure) to compose a command line:
$host The name of the host on which the prolog or epilog procedures
are started.
$ja_task_id
The array job task index (0 if not an array job).
$job_owner
The user name of the job owner.
$job_id
Grid Engine's unique job identification number.
$job_name
The name of the job.
$processors
The processors string as contained in the queue configuration
(see queue_conf(5)) of the master queue (the queue in which the
prolog and epilog procedures are started).
$queue The cluster queue name of the master queue instance, i.e. the
cluster queue in which the prolog and epilog procedures are
started.
$stdin_path
The pathname of the stdin file. This is always /dev/null for
prolog, pe_start, pe_stop and epilog. It is the pathname of the
stdin file for the job in the job script. When delegated file
staging is enabled, this path is set to $fs_stdin_tmp_path. When
delegated file staging is not enabled, it is the stdin pathname
given via DRMAA or qsub.
$stdout_path
$stderr_path
The pathname of the stdout/stderr file. This always points to
the output/error file. When delegated file staging is enabled,
this path is set to $fs_stdout_tmp_path/$fs_stderr_tmp_path.
When delegated file staging is not enabled, it is the
stdout/stderr pathname given via DRMAA or qsub.
$merge_stderr
If this flag is 1, stdout and stderr are merged in one file, the
stdout file. Otherwise (the default), no merging is done.
Merging of stderr and stdout can be requested via the DRMAA job
template attribute 'drmaa_join_files' (see drmaa_attributes(3))
or the qsub parameter '-j y' (see qsub(1)).
$fs_stdin_host
When delegated file staging is requested for the stdin file,
this is the name of the host where the stdin file has to be
copied from before the job is started.
$fs_stdout_host
$fs_stderr_host
When delegated file staging is requested for the stdout/stderr
file, this is the name of the host where the stdout/stderr file
has to be copied to after the job has run.
$fs_stdin_path
When delegated file staging is requested for the stdin file,
this is the pathname of the stdin file on the host
$fs_stdin_host.
$fs_stdout_path
$fs_stderr_path
When delegated file staging is requested for the stdout/stderr
file, this is the pathname of the stdout/stderr file on the host
$fs_stdout_host/$fs_stderr_host.
$fs_stdin_tmp_path
When delegated file staging is requested for the stdin file,
this is the destination pathname of the stdin file on the
execution host. The prolog must copy the stdin file from
$fs_stdin_host:$fs_stdin_path to localhost:$fs_stdin_tmp_path to
establish delegated file staging of the stdin file.
$fs_stdout_tmp_path
$fs_stderr_tmp_path
When delegated file staging is requested for the stdout/stderr
file, this is the source pathname of the stdout/stderr file on
the execution host. The epilog must copy the stdout file from
localhost:$fs_stdout_tmp_path to $fs_stdout_host:$fs_stdout_path
(the stderr file from localhost:$fs_stderr_tmp_path to
$fs_stderr_host:$fs_stderr_path) to establish delegated file
staging of the stdout/stderr file.
$fs_stdin_file_staging
$fs_stdout_file_staging
$fs_stderr_file_staging
When delegated file staging is requested for the
stdin/stdout/stderr file, the flag is set to "1", otherwise it
is set to "0" (see in delegated_file_staging how to enable
delegated file staging). These three flags correspond to the
DRMAA job template attribute 'drmaa_transfer_files' (see
drmaa_attributes(3)).
If the prolog is written in shell script, the usual care must be
exercised, e.g. when expanding such values from the command line or the
environment which are user-supplied. In particular, note that the job
name could be of the form "; evil doings;". Also, use absolute path
names for commands if inheriting the user's environment.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
See sge_shepherd(8) for the significance of exit codes returned by the
prolog.
epilog
The path of an executable, with optional argument, that is started
after execution of Grid Engine jobs with the same environment setting
as that for the Grid Engine job that has just completed (see qsub(1)),
with the addition of the variable named SGE_JOBEXIT_STAT which holds
the exit status of the job. The epilog command is started directly,
not in a shell. An optional prefix "user@" specifies the user under
which this procedure is to be started. In that case see the SECURITY
section below concerning security issues running as a privileged user.
The procedure's standard output and the error output stream are written
to the same file used for the standard output and error output of each
job.
The same special variables can be used to compose a command line as for
the prolog.
This procedure is intended as a means for the Grid Engine administrator
to automate the execution of general site-specific tasks, like the
cleaning up of temporary file systems with the need for the same
context information as the job. For a parallel job, only a single
instance of the epilog is run, on the master node. Each sge_execd(8)
may use a private epilog. Correspondingly, the global or execution
host local configurations can be overwritten by the queue configuration
(see queue_conf(5)). Changing epilog will take immediate effect.
The default for epilog is the special value NONE, which prevents
execution of an epilog. The same special variables as for prolog can
be used to constitute a command line.
The same considerations (above) apply as for a prolog when an epilog is
written in shell script.
See sge_shepherd(8) for the significance of exit codes returned by the
epilog.
shell_start_mode
Note: Deprecated, may be removed in future release.
This parameter defines the mechanisms which are used to actually invoke
the job scripts on the execution hosts. The following values are
recognized:
unix_behavior
If a user starts a job shell script under UNIX interactively by
invoking it just with the script name the operating system's
executable loader uses the information provided in a comment
such as `#!/bin/csh' in the first line of the script to detect
which command interpreter to start to interpret the script. This
mechanism is used by Grid Engine when starting jobs if
unix_behavior is defined as shell_start_mode.
posix_compliant
POSIX does not consider first script line comments such a
`#!/bin/csh' as significant. The POSIX standard for batch
queueing systems (P1003.2d) therefore requires a compliant
queueing system to ignore such lines, but to use user-specified
or configured default command interpreters instead. Thus, if
shell_start_mode is set to posix_compliant Grid Engine will
either use the command interpreter indicated by the -S option of
the qsub(1) command or the shell parameter of the queue to be
used (see queue_conf(5) for details).
script_from_stdin
Setting the shell_start_mode parameter either to posix_compliant
or unix_behavior requires you to set the umask in use for
sge_execd(8) such that every user has read access to the
active_jobs directory in the spool directory of the
corresponding execution daemon. In case you have prolog and
epilog scripts configured, they also need to be readable by any
user who may execute jobs.
If this violates your site's security policies you may want to
set shell_start_mode to script_from_stdin. This will force Grid
Engine to open the job script as well as the epilog and prolog
scripts for reading into STDIN as root (if sge_execd(8) was
started as root) before changing to the job owner's user
account. The script is then fed into the STDIN stream of the
command interpreter indicated by the -S option of the qsub(1)
command or the shell parameter of the queue to be used (see
queue_conf(5) for details).
Thus setting shell_start_mode to script_from_stdin also implies
posix_compliant behavior. Note, however, that feeding scripts
into the STDIN stream of a command interpreter may cause trouble
if commands like rsh(1) are invoked inside a job script as they
also process the STDIN stream of the command interpreter. These
problems can usually be resolved by redirecting the STDIN
channel of those commands to come from /dev/null (e.g. rsh host
date < /dev/null). Note also, that any command-line options
associated with the job are passed to the executing shell. The
shell will only forward them to the job if they are not
recognized as valid shell options.
Changes to shell_start_mode will take immediate effect. The default
for shell_start_mode is posix_compliant.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
login_shells
UNIX command interpreters like the Bourne-Shell (see sh(1)) or the C-
Shell (see csh(1)) can be used by Grid Engine to start job scripts. The
command interpreters can either be started as login-shells (i.e. all
system and user default resource files like .login or .profile will be
executed when the command interpreter is started, and the environment
for the job will be set up as if the user has just logged in) or just
for command execution (i.e. only shell-specific resource files like
.cshrc will be executed and a minimal default environment is set up by
Grid Engine - see qsub(1)). The parameter login_shells contains a
comma-separated list of the executable names of the command
interpreters to be started as login shells. Shells in this list are
only started as login shells if the parameter shell_start_mode (see
above) is set to posix_compliant.
Changes to login_shells will take immediate effect. The default for
login_shells is sh,bash,csh,tcsh,ksh.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
min_uid
min_uid places a lower bound on user IDs that may use the cluster.
Users whose user ID (as returned by getpwnam(3)) is less than min_uid
will not be allowed to run jobs on the cluster.
Changes to min_uid will take immediate effect. The default is 0 but,
if CSP or MUNGE security is not in use, the installation script sets it
to 100 to prevent unauthorized access by root or system accounts.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
min_gid
This parameter sets the lower bound on group IDs that may use the
cluster. Users whose default group ID (as returned by getpwnam(3)) is
less than min_gid will not be allowed to run jobs on the cluster.
Changes to min_gid will take immediate effect. The default is 0 but,
if CSP security is not in use, the installation script sets it to 100
to prevent unauthorized access by root or system accounts.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
user_lists
The user_lists parameter contains a comma-separated list of user access
lists as described in access_list(5). Each user contained in at least
one of the access lists has access to the cluster. If the user_lists
parameter is set to NONE (the default) any user has access if not
explicitly excluded via the xuser_lists parameter described below. If
a user is contained both in an access list xuser_lists and user_lists,
the user is denied access to the cluster.
Changes to user_lists will take immediate effect.
This value is a global configuration parameter insofar as it restricts
access to the whole cluster, but the execution host local configuration
may define a value to restrict access to that host further.
xuser_lists
The xuser_lists parameter contains a comma-separated list of user
access lists as described in access_list(5). Each user contained in at
least one of the access lists is denied access to the cluster. If the
xuser_lists parameter is set to NONE (the default) any user has access.
If a user is contained both in an access list in xuser_lists and
user_lists (see above) the user is denied access to the cluster.
Changes to xuser_lists will take immediate effect.
This value is a global configuration parameter insofar as it restricts
access to the whole cluster, but the execution host local configuration
may define a value to restrict access to that host further.
administrator_mail
administrator_mail specifies a comma-separated list of the electronic
mail address(es) of the cluster administrator(s) to whom internally-
generated problem reports are sent. The mail address format depends on
your electronic mail system and how it is configured; consult your
system's configuration guide for more information.
Changing administrator_mail takes immediate effect. The default for
administrator_mail is an empty mail list.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
projects
The projects list contains all projects which are granted access to
Grid Engine. Users not belonging to one of these projects cannot submit
jobs. If users belong to projects in the projects list and the
xprojects list (see below), they also cannot submit jobs.
Changing projects takes immediate effect. The default for projects is
none.
While globally-configured projects affect job submission, projects
configured for queues or hosts affect job execution in the appropriate
context.
xprojects
The xprojects list contains all projects that are denied access to Grid
Engine. Users belonging to one of these projects cannot use Grid
Engine. If users belong to projects in the projects list (see above)
and the xprojects list, they also cannot use the system.
Changing xprojects takes immediate effect. The default for xprojects
is none.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
load_report_time
System load is reported periodically by the execution daemons to
sge_qmaster(8). The parameter load_report_time defines the time
interval between load reports.
Each sge_execd(8) may use a different load report time. Changing
load_report_time will take immediate effect.
Note: Be careful when modifying load_report_time. Reporting load too
frequently might block sge_qmaster(8) especially if the number of
execution hosts is large. Moreover, since the system load typically
increases and decreases smoothly, frequent load reports hardly offer
any benefit.
The default for load_report_time is 40 seconds.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
reschedule_unknown
Determines whether jobs on hosts in an unknown state are rescheduled,
and thus sent to other hosts. Hosts are registered as unknown if
sge_master(8) cannot establish contact to the sge_execd(8) on those
hosts (see max_unheard). Likely reasons are a breakdown of the host or
a breakdown of the network connection in between, but also sge_execd(8)
may not be executing on such hosts.
In any case, Grid Engine can reschedule jobs running on such hosts to
another system. reschedule_unknown controls the time which Grid Engine
will wait before jobs are rescheduled after a host became unknown. The
time format specification is hh:mm:ss. If the special value 00:00:00 is
set, then jobs will not be rescheduled from this host.
Rescheduling is only initiated for jobs which have activated the rerun
flag (see the -r y option of qsub(1) and the rerun option of
queue_conf(5)). Parallel jobs are only rescheduled if the host on
which their master task executes is in unknown state. The behavior of
reschedule_unknown for parallel jobs and for jobs without the rerun
flag set can be adjusted using the qmaster_params settings
ENABLE_RESCHEDULE_KILL and ENABLE_RESCHEDULE_SLAVE.
Checkpointing jobs will only be rescheduled when the when option of the
corresponding checkpointing environment contains an appropriate flag.
(see checkpoint(5)). Interactive jobs (see qsh(1), qrsh(1), qtcsh(1))
are not rescheduled.
The default for reschedule_unknown is 00:00:00
The global configuration entry for this value may be overwritten by the
execution host local configuration.
max_unheard
If sge_qmaster(8) could not contact, or was not contacted by, the
execution daemon of a host for max_unheard seconds, all queues residing
on that particular host are set to status unknown. sge_qmaster(8), at
least, should be contacted by the execution daemons in order to get the
load reports. Thus, max_unheard should be greater than the
load_report_time (see above).
Changing max_unheard takes immediate effect. The default for
max_unheard is 5 minutes.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
loglevel
This parameter specifies the level of detail that Grid Engine
components such as sge_qmaster(8) or sge_execd(8) use to produce
informative, warning or error messages which are logged to the messages
files in the master and execution daemon spool directories (see the
description of the execd_spool_dir parameter above). The following
message levels are available:
log_err
All error events recognized are logged.
log_warning
All error events recognized, and all detected signs of
potentially erroneous behavior, are logged.
log_info
All error events recognized, all detected signs of potentially
erroneous behavior, and a variety of informative messages are
logged.
Changing loglevel will take immediate effect.
The default for loglevel is log_warning.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
max_aj_instances
This parameter defines the maximum number of array tasks to be
scheduled to run simultaneously per array job. An instance of an array
task will be created within the master daemon when it gets a start
order from the scheduler. The instance will be destroyed when the array
task finishes. Thus the parameter provides control mainly over the
memory consumption of array jobs in the master daemon. It is most
useful for very large clusters and very large array jobs. The default
for this parameter is 2000. The value 0 will deactivate this limit and
will allow the scheduler to start as many array job tasks as suitable
resources are available in the cluster.
Changing max_aj_instances will take immediate effect.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
max_aj_tasks
This parameter defines the maximum number of array job tasks within an
array job. sge_qmaster(8) will reject all array job submissions which
request more than max_aj_tasks array job tasks. The default for this
parameter is 75000. The value 0 will deactivate this limit.
Changing max_aj_tasks will take immediate effect.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
max_u_jobs
The number of active (not finished) jobs which each Grid Engine user
can have in the system simultaneously is controlled by this parameter.
A value greater than 0 defines the limit. The default value 0 means
"unlimited". If the max_u_jobs limit is exceeded by a job submission
then the submission command exits with exit status 25 and an
appropriate error message.
Changing max_u_jobs will take immediate effect.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
max_jobs
The number of active (not finished) jobs simultaneously allowed in Grid
Engine is controlled by this parameter. A value greater than 0 defines
the limit. The default value 0 means "unlimited". If the max_jobs
limit is exceeded by a job submission then the submission command exits
with exit status 25 and an appropriate error message.
Changing max_jobs will take immediate effect.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
max_advance_reservations
The number of active (not finished) Advance Reservations simultaneously
allowed in Grid Engine is controlled by this parameter. A value greater
than 0 defines the limit. The default value 0 means "unlimited". If the
max_advance_reservations limit is exceeded by an Advance Reservation
request then the submission command exits with exit status 25 and an
appropriate error message.
Changing max_advance_reservations will take immediate effect.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
enforce_project
If set to true, users are required to request a project whenever
submitting a job. See the -P option to qsub(1) for details.
Changing enforce_project will take immediate effect. The default for
enforce_project is false.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
enforce_user
If set to true, a user(5) must exist to allow for job submission. Jobs
are rejected if no corresponding user exists.
If set to auto, a user(5) object for the submitting user will
automatically be created during job submission, if one does not already
exist. The auto_user_oticket, auto_user_fshare,
auto_user_default_project, and auto_user_delete_time configuration
parameters will be used as default attributes of the new user(5)
object.
Changing enforce_user will take immediate effect. The default for
enforce_user is auto.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
auto_user_oticket
The number of override tickets to assign to automatically created
user(5) objects. User objects are created automatically if the
enforce_user attribute is set to auto.
Changing auto_user_oticket will affect any newly created user objects,
but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
auto_user_fshare
The number of functional shares to assign to automatically created
user(5) objects. User objects are created automatically if the
enforce_user attribute is set to auto.
Changing auto_user_fshare will affect any newly created user objects,
but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
auto_user_default_project
The default project to assign to automatically created user(5) objects.
User objects are created automatically if the enforce_user attribute is
set to auto.
Changing auto_user_default_project will affect any newly created user
objects, but will not change user objects created in the past.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
auto_user_delete_time
The number of seconds of inactivity after which automatically created
user(5) objects will be deleted. User objects are created automatically
if the enforce_user attribute is set to auto. If the user has no active
or pending jobs for the specified amount of time, the object will
automatically be deleted. A value of 0 can be used to indicate that
the automatically created user object is permanent and should not be
automatically deleted.
Changing auto_user_delete_time will affect the deletion time for all
users with active jobs.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
set_token_cmd
NB. If the qmaster spool area is world-readable for non-admin users,
you must take steps to encrypt the credentials, since they are stored
there after job submission.
Set_token_cmd points to a command which sets and extends AFS tokens for
Grid Engine jobs. It is run by sge_coshepherd(8). It expects two
command line parameters:
set_token_cmd user token_extend_after_seconds
It reads the token from STDIN, extends its expiration time, and re-sets
the token. As a shell script this command will call the programs:
- SetToken
- forge
which are provided by your distributor as source code. The script looks
as follows:
--------------------------------
#!/bin/sh
# set_token_cmd
forge -u $1 -t $2 | SetToken
--------------------------------
Since it is necessary for forge to read the secret AFS server key, a
site might wish to replace the set_token_cmd script by a command, which
connects to a custom daemon at the AFS server. The token must be forged
at the AFS server and returned to the local machine, where SetToken is
executed.
Changing set_token_cmd will take immediate effect. The default for
set_token_cmd is none.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
pag_cmd
The path to your pagsh is specified via this parameter. The
sge_shepherd(8) process and the job run in a pagsh. Please ask your AFS
administrator for details.
Changing pag_cmd will take immediate effect. The default for pag_cmd
is none.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
token_extend_time
The token_extend_time is the time period for which AFS tokens are
periodically extended. Grid Engine will call the token extension 30
minutes before the tokens expire until jobs have finished and the
corresponding tokens are no longer required.
Changing token_extend_time will take immediate effect. The default for
token_extend_time is 24:0:0, i.e. 24 hours.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
shepherd_cmd
Alternative path to the shepherd_cmd binary. Typically used to call the
shepherd binary by a wrapper script or command. If used in production,
this must take care to handle signals the way the shepherd would or,
for instance, jobs will not be killed correctly.
Changing shepherd_cmd will take immediate effect. The default for
shepherd_cmd is none.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
gid_range
The gid_range is a comma-separated list of range expressions of the
form m-n, where m and n are integer numbers greater than 99, and m is
an abbreviation for m-m. These numbers are used in sge_execd(8) to
identify processes belonging to the same job.
Each sge_execd(8) may use a separate set of group ids for this purpose.
All numbers in the group id range have to be unused supplementary group
ids on the system, where the sge_execd(8) is started.
Changing gid_range will take immediate effect. There is no default for
gid_range. The administrator will have to assign a value for gid_range
during installation of Grid Engine.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
qmaster_params
A list of additional parameters can be passed to the Grid Engine
qmaster. The following values are recognized:
ENABLE_ENFORCE_MASTER_LIMIT
If this parameter is set then the s_rt, h_rt limits of a running
job are tested and acted on by the sge_qmaster(8) when the
sge_execd(8) where the job was run is in an unknown state.
After the s_rt or h_rt limit of a job is expired, the master
daemon will wait additional time defined by DURATION_OFFSET (see
sched_conf(5)). If the execution daemon still cannot be
contacted when this additional time is elapsed, then the master
daemon will force the deletion of the job (see -f of qdel(1)).
For jobs which will be deleted that way, an accounting record
will be created. For usage, the record will contain the last
reported online value when the execution daemon could contact
qmaster. The failed state in the record will be set to 37 to
indicate that the job was terminated by a limit enforced by the
master daemon.
After the restart of sge_qmaster(8) the limit enforcement will
be triggered after twice the biggest load_report_interval
interval defined in sge_conf(5) has elapsed. This will give the
execution daemons enough time to re-register with the master
daemon.
ENABLE_FORCED_QDEL_IF_UNKNOWN
If this parameter is set then a deletion request for a job is
automatically interpreted as a forced deletion request (see -f
of qdel(1)) if the host where the job is running is in an
unknown state.
ENABLE_FORCED_QDEL
If this parameter is set, non-administrative users can force
deletion of their own jobs via the -f option of qdel(1).
Without this parameter, forced deletion of jobs is only allowed
by the Grid Engine manager or operator.
Note: Forced deletion for jobs is executed differently,
depending on whether users are Grid Engine administrators or
not. In the case of administrative users, the jobs are removed
from the internal database of Grid Engine immediately. For
regular users, the equivalent of a normal qdel(1) is executed
first, and deletion is forced only if the normal cancellation
was unsuccessful.
FORBID_RESCHEDULE
If this parameter is set, re-queueing of jobs cannot be
initiated by the job script which is under control of the user.
Without this parameter, jobs returning the value 99 are
rescheduled. This can be used to cause the job to be restarted
on a different machine, for instance if there are not enough
resources on the current one.
FORBID_APPERROR
If this parameter is set, the application cannot set itself to
the error state. Without this parameter jobs returning the
value 100 are set to the error state (and therefore can be
manually rescheduled by clearing the error state). This can be
used to set the job to the error state when a starting condition
of the application is not fulfilled before the application
itself has been started, or when a clean up procedure (e.g. in
the epilog) decides that it is necessary to run the job again.
To do so, return 100 in the prolog, pe_start, job script,
pe_stop or epilog script.
DISABLE_AUTO_RESCHEDULING
Note: Deprecated, may be removed in future release.
If set to "true" or "1", the reschedule_unknown parameter is not
taken into account.
ENABLE_RESCHEDULE_KILL
If set to "true" or "1", the reschedule_unknown parameter
affects also jobs which have the rerun flag not activated (see
the -r y option of qsub(1) and the rerun option of
queue_conf(5)), but they are just finished as they can't be
rescheduled.
ENABLE_RESCHEDULE_SLAVE
If set to "true" or "1" Grid Engine triggers job rescheduling
also when the host where the slave tasks of a parallel job
executes is in unknown state, if the reschedule_unknown
parameter is activated.
MAX_DYN_EC
Sets the max number of dynamic event clients (as used by qsub
-sync y and by Grid Engine DRMAA API library sessions). The
default is 1000. The number of dynamic event clients should not
be bigger than half of the number of file descriptors the system
has. The file descriptors are shared among the connections to
all exec hosts, all event clients, and file handles that the
qmaster needs.
MONITOR_TIME
Specifies the time interval when the monitoring information
should be printed. The monitoring is disabled by default and can
be enabled by specifying an interval. The monitoring is per-
thread and is written to the messages file or displayed by
qping(1) with option -f. Example: MONITOR_TIME=0:0:10 generates
and prints the monitoring information approximately every 10
seconds. The specified time is a guideline only and not a fixed
interval. The interval that is actually used is printed. In
this example, the interval could be anything between 9 seconds
and 20 seconds.
LOG_MONITOR_MESSAGE
Monitoring information is logged into the messages files by
default. This information can be accessed via by qping(1). If
monitoring is always enabled, the messages files can become
quite large. This switch disables logging into the messages
files, making qping -f the only source of monitoring data.
Profiling provides the user with the possibility to get system
measurements. This can be useful for debugging or optimization of the
system. The profiling output will be done within the messages file.
PROF_SIGNAL
Enables profiling for the qmaster signal thread (e.g.
PROF_SIGNAL=true).
PROF_WORKER
Enables profiling for the qmaster worker threads (e.g.
PROF_WORKER=true).
PROF_LISTENER
Enables profiling for the qmaster listener threads (e.g.
PROF_LISTENER=true).
PROF_DELIVER
Enables profiling for the qmaster event deliver thread (e.g.
PROF_DELIVER=true).
PROF_TEVENT
Enables the profiling for the qmaster timed event thread (e.g.
PROF_TEVENT=true).
PROF_SCHEDULER
Enables profiling for the qmaster scheduler thread (e.g.
PROF_SCHEDULER=true).
Please note that the CPU utime and stime values contained in the
profiling output are not per-thread CPU times. These CPU usage
statistics are per-process statistics. So the printed profiling values
for CPU mean "CPU time consumed by sge_qmaster (all threads) while the
reported profiling level was active".
STREE_SPOOL_INTERVAL
Sets the time interval for spooling the sharetree usage. The
default is set to 00:04:00. The setting accepts colon-separated
string or seconds. There is no setting to turn the sharetree
spooling off. (e.g. STREE_SPOOL_INTERVAL=00:02:00)
MAX_JOB_DELETION_TIME
Sets the value of how long the qmaster will spend deleting jobs.
After this time, the qmaster will continue with other tasks and
schedule the deletion of remaining jobs at a later time. The
default value is 3 seconds, and will be used if no value is
entered. The range of valid values is > 0 and <= 5. (e.g.
MAX_JOB_DELETION_TIME=1)
gdi_timeout
Sets how long the communication will wait for GDI send/receive
operations. (GDI is the Grid Engine Database Interface for
interacting with objects managed by the qmaster.) The default
value is set to 60 seconds. After this time, the communication
library will retry, if "gdi_retries" is configured, receiving
the GDI request. If not configured the communication will return
with a "gdi receive failure" (e.g. gdi_timeout=120 will set the
timeout time to 120 sec). Configuring no gdi_timeout value, the
value defaults to 60 sec.
gdi_retries
Sets how often the GDI receive call will be repeated until the
GDI receive error appears. The default is set to 0. In this case
the call will be done 1 time with no retry. Setting the value
to -1 the call will be done permanently. In combination with the
gdi_timeout parameter it is possible to configure a system with,
e.g. slow NFS, to make sure that all jobs will be submitted.
(E.g. gdi_retries=4.)
cl_ping
Turns on/off a communication library ping. This parameter will
create additional debug output. This output shows information
about the error messages which are returned by communication and
it will give information about the application status of the
qmaster. For example, if it's unclear what's the reason for gdi
timeouts, this may show you some useful messages. The default
value is false (off) (i.e. cl_ping=false).
SCHEDULER_TIMEOUT
Setting this parameter allows the scheduler GDI event
acknowledge timeout to be manually configured to a specific
value. Currently the default value is 10 minutes with the
default scheduler configuration and limited between 600 and 1200
seconds. Value is limited only in case of default value. The
default value depends on the current scheduler configuration.
The SCHEDULER_TIMEOUT value is specified in seconds.
jsv_timeout
This parameter measures the response time of the server JSV. In
the event that the response time of the JSV is longer than the
timeout value specified, this will cause the JSV to be re-
started. The default value for the timeout is 10 seconds and if
modified, must be greater than 0. If the timeout is reached, the
JSV will only try to re-start once; if the timeout is reached
again, an error will occur.
jsv_threshold
The threshold of a JSV is measured as the time it takes to
perform a server job verification. If this value is greater than
the user-defined value, it will cause logging to appear in the
qmaster messages file at the INFO level. By setting this value
to 0, all jobs will be logged in the qmaster messages file. This
value is specified in milliseconds and has a default value of
5000.
OLD_RESCHEDULE_BEHAVIOR
Beginning with version 8.0.0 of Grid Engine the scheduling
behavior changed for jobs that are rescheduled by users.
Rescheduled jobs will not be put at the beginning of the pending
job list anymore. The submit time of those jobs is set to the
end time of the previous run. Due to that, those rescheduled
jobs will be appended to the pending job list as if a new job
had been submitted. To achieve the old behaviour, set the
parameter OLD_RESCHEDULE_BEHAVIOR. Please note that this
parameter is deprecated, so it might be removed with the next
minor release.
OLD_RESCHEDULE_BEHAVIOR_ARRAY_JOB
Beginning with version 8.0.0 of Grid Engine the scheduling
behavior changed for array job tasks that are rescheduled by
users. As soon as an array job task gets rescheduled, all
remaining pending tasks of that job will be put at the end of
the pending job list. To achieve the old scheduling behaviour
set the parameter OLD_RESCHEDULE_BEHAVIOR_ARRAY_JOB. Please note
that this parameter is deprecated, so it might be removed with
the next minor release.
SIMULATE_EXECDS
Bypass execd communication in qmaster for (e.g. for throughput
tests with fake hosts). "Unknown" queue states are suppressed,
but load_thresholds=none must be used to avoid queues going into
an alarm state since load values are not simulated. Submitted
jobs are dispatched, and act as if they are run for a time
determined by the job's first argument, after 3s spent in the
"transferring" state. I.e. there is a simulated 10s runtime for
a command such as
qsub -b y sleep 10
In this condition, job deletion works, but at least interactive
jobs, tightly-integrated parallel ones, and job suspension
don't. The execution hosts configured need not exist, but must
have resolvable network names.
NO_AUTHENTICATION
Don't do authentication when GSSAPI security is enabled. This,
and the following parameter, determine the GSS global
configuration, which can be overridden with the execd_params of
the global or host-specific configuration.
NO_SECURITY
Don't store and forward credentials if GSSAPI security is
enabled.
ENABLE_MTRACE
If GNU malloc is in use (rather then jemalloc, which is usually
used on GNU/Linux) enable the facility for recording all memory
allocation/deallocation. Requires MALLOC_TRACE to be set in the
environment (see mtrace(3)).
__TEST_SLEEP_AFTER_REQUEST
Used by the test suite to block the worker thread for five
seconds after handling a request to ensure another worker thread
will handle a subsequent request.
print_malloc_info
Allow monitoring malloc(3) statistics if Grid Engine is built to
use the jemalloc((3)) allocator. The information is usually
obtained with the -info option of qping(1), but is generated by
the daemons and can't be controlled by the client. The default
is false since the output is verbose and might confuse programs
parsing the traditional format. The parameter can also be set
in execd_params and affects both qmaster and execd daemons.
Changing qmaster_params will take immediate effect, except that
gdi_timeout, gdi_retries, and cl_ping will take effect only for new
connections. The default for qmaster_params is none.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
execd_params
This is used for passing additional parameters to the Grid Engine
execution daemon. The following values are recognized:
ACCT_RESERVED_USAGE
If this parameter is set to true, the usage of "reserved"
(allocated) resources is reported in the accounting entries cpu,
mem, and maxvmem instead of the measured usage. The live usage
values reported by qstat(1) are affected similarly. This means
that the "wall clock" time (end-start) is reported instead of
CPU time, memory usage is memory allocation times wall clock
time (which is only computable if the job requests h_vmem or
s_vmem), and maxvmem is the requested h_vmem or s_vmem; the same
scaling by slots is done as without this option.
Note that both the wall clock and CPU times are normally
available (see accounting(5)), so this option loses information,
and "reserved" here has nothing to do with advance/resource
reservation. See also SHARETREE_RESERVED_USAGE below.
ENABLE_WINDOMACC
If this parameter is set to true, Windows Domain accounts
(WinDomAcc) are used on Windows hosts. These accounts require
the use of sgepasswd(1). (See also sgepasswd(5).) If this
parameter is set to false, or is not set, local Windows accounts
are used. On non-Windows hosts, this parameter is ignored.
IGNORE_NGROUPS_MAX_LIMIT
If a user is assigned to NGROUPS_MAX-1 supplementary groups, so
that Grid Engine is not able to add one for job tracking, then
the job will go into an error state when it is started.
(NGROUPS_MAX is the system limit on supplementary groups; see
limits.h(7).) Administrators that want to prevent the system
doing so can set this parameter. In this case the NGROUPS_MAX
limit is ignored and the additional group (see gid_range) is not
set. As a result for those jobs no online usage will be
available. Also the parameter ENABLE_ADDGRP_KILL will have no
effect. Please note that it is not recommended to use this
parameter. Instead the group membership of the submit user
should be reduced.
KEEP_ACTIVE
This value should only be set for debugging purposes. If set to
true, the execution daemon will not remove the spool directory
maintained by sge_shepherd(8) for a job, or cgroup directories
if cgroups are in use under Linux.
PTF_MIN_PRIORITY, PTF_MAX_PRIORITY
The maximum/minimum priority which Grid Engine will assign to a
job. Typically this is a negative/positive value in the range
of -20 (maximum) to 19 (minimum) for systems which allow setting
of priorities with the nice(2) system call. Other systems may
provide different ranges.
The default priority range (which varies from system to system)
is installed either by removing the parameters, or by setting a
value of -999.
See the "messages" file of the execution daemon for the
predefined default value on your hosts. The values are logged
during the startup of the execution daemon.
PROF_EXECD
Enables the profiling for the execution daemon (e.g.
PROF_EXECD=true).
NOTIFY_KILL
This parameter allows you to change the notification signal for
the signal SIGKILL (see the -notify option of qsub(1)). The
parameter either accepts signal names (use the -l option of
kill(1)) or the special value none. If set to none, no
notification signal will be sent. If it is set to TERM, for
instance, or another signal name, then this signal will be sent
as the notification signal.
NOTIFY_SUSP
With this parameter it is possible to modify the notification
signal for the signal SIGSTOP (see the -notify parameter of
qsub(1)). The parameter either accepts signal names (use the -l
option of kill(1)) or the special value none. If set to none, no
notification signal will be sent. If it is set to TSTP, for
instance, or another signal name, then this signal will be sent
as notification signal.
SHARETREE_RESERVED_USAGE
If this parameter is set to true, the usage of "reserved"
resources is taken for the Grid Engine share tree consumption
instead of measured usage. See the description of
ACCT_RESERVED_USAGE above for details.
Note: When running tightly integrated jobs with
SHARETREE_RESERVED_USAGE set, and with accounting_summary
enabled in the parallel environment, reserved usage will only be
reported by the master task of the parallel job. No per-
parallel task usage records will be sent from execd to qmaster,
which can significantly reduce load on qmaster when running
large tightly integrated parallel jobs.
USE_QSUB_GID
If this parameter is set to true, the primary group id active
when a job was submitted will be set to become the primary group
id for job execution. If the parameter is not set, the primary
group id as defined for the job owner in the execution host
passwd database is used.
The feature is only available for jobs submitted via qsub(1),
qrsh(1), qmake(1) and qtcsh(1). Also, it only works for qrsh(1)
jobs (and thus also for qtcsh(1) and qmake(1)) if builtin
communication is used, or the rsh and rshd components which are
provided with Grid Engine (see remote_startup(5)).
S_DESCRIPTORS, H_DESCRIPTORS, S_MAXPROC, H_MAXPROC, S_MEMORYLOCKED,
H_MEMORYLOCKED, S_LOCKS, H_LOCKS
Specifies soft and hard resource limits as implemented by the
setrlimit(2) system call. See that manual page on your system
for more information. These parameters complete the list of
limits set by the RESOURCE LIMITS parameter of the queue
configuration as described in queue_conf(5). Unlike the
resource limits in the queue configuration, these resource
limits are set for every job on this execution host. If a value
is not specified, the resource limit is inherited from the
execution daemon process. Because this would lead to
unpredictable results if only one limit of a resource is set
(soft or hard), the corresponding other limit is set to the same
value.
S_DESCRIPTORS and H_DESCRIPTORS specify a value one greater than
the maximum file descriptor number that can be opened by any
process of a job.
S_MAXPROC and H_MAXPROC specify the maximum number of processes
that can be created by the job user on this execution host.
S_MEMORYLOCKED and H_MEMORYLOCKED specify the maximum number of
bytes of virtual memory that may be locked into RAM. This
typically needs to be set to "unlimited" for use with openib
Infiniband, and possibly similar transports.
S_LOCKS and H_LOCKS specify the maximum number of file locks any
process of a job may establish.
All of these values can be specified using the multiplier
letters k, K, m, M, g and G; see sge_types(1) for details.
Limits can be specified as "infinity" to remove limits (if
possible), per setrlimit(2).
INHERIT_ENV
This parameter indicates whether the shepherd should allow the
environment inherited by the execution daemon from the shell
that started it to be inherited by the job it's starting. When
true, any environment variable that is set in the shell which
starts the execution daemon at the time the execution daemon is
started will be set in the environment of any jobs run by that
execution daemon, unless the environment variable is explicitly
overridden, such as PATH or LOGNAME. If set to false, each job
starts with only the environment variables that are explicitly
passed on by the execution daemon, such as PATH and LOGNAME.
The default value is true.
SET_LIB_PATH
This parameter tells the execution daemon whether to add the
Grid Engine shared library directory to the library path of
executed jobs. If set to true, and INHERIT_ENV is also set to
true, the Grid Engine shared library directory will be prepended
to the library path which is inherited from the shell which
started the execution daemon. If INHERIT_ENV is set to false,
the library path will contain only the Grid Engine shared
library directory. If set to false, and INHERIT_ENV is set to
true, the library path exported to the job will be the one
inherited from the shell which started the execution daemon. If
INHERIT_ENV is also set to false, the library path will be
empty. After the execution daemon has set the library path, it
may be further altered by the shell in which the job is
executed, or by the job script itself. The default value for
SET_LIB_PATH is false.
ENABLE_ADDGRP_KILL
If this parameter is set then Grid Engine uses the supplementary
group ids (see gid_range) to identify all processes which are to
be terminated when a job is deleted, or when sge_shepherd(8)
cleans up after job termination. This currently only works
under GNU/Linux, Solaris, Tru64, FreeBSD, and Darwin. The
default value is on. Irrelevant with cpuset support (see
USE_CGROUPS below).
PDC_INTERVAL
This parameter defines the interval (default 1s) between runs of
the PDC (Portable Data Collector) by the execution daemon. The
PDC is responsible for enforcing the resource limits s_cpu,
h_cpu, s_vmem and h_vmem (see queue_conf(5)) and job usage
collection. The parameter can be set to a time_specifier (see
sge_types(5)), to PER_LOAD_REPORT or to NEVER.
If this parameter is set to PER_LOAD_REPORT the PDC is triggered
in the same interval as load_report_time (see above). If this
parameter is set to NEVER the PDC run is never triggered. The
default is 1 second.
Note: A PDC run is quite compute intensive, and may degrade the
performance of the running jobs. However, if the PDC runs less
often, or never, the online usage can be incomplete or totally
missing (for example online usage of very short running jobs
might be missing) and the resource limit enforcement is less
accurate or would not happen if PDC is turned off completely.
ENABLE_BINDING
If this parameter is set, then Grid Engine enables the core
binding module within the execution daemon to apply binding
parameters that are specified at submission time of a job. This
parameter is set by default if Grid Engine was compiled with
support for core binding. Find more information for job to core
binding in the section -binding of qsub(1).
SIMULATE_JOBS
Allow the simulation of jobs. (Job spooling and execution on
the execd side is disabled.)
NO_AUTHENTICATION
Turn off authentication for the relevant host(s) when the
authentication GSSAPI security feature is enabled globally.
DO_AUTHENTICATION
Turn on authentication for the relevant host(s) when the
authentication GSSAPI security feature is enabled globally.
NO_SECURITY
Turn off storing and forwarding of credentials when the GSSAPI
security feature is enabled globally.
USE_SYSLOG
Write messages to the system logger (see syslog(3)) rather than
into the spool directory.
USE_QIDLE
Automatically start any executable named qidle present in the
architecture-dependent binary directory as a load sensor,
similarly to qloadsensor (which is run unconditionally). It is
intended to determine whether a workstation is "idle" or not,
i.e. whether it has an interactive load. See e.g. the idle time
HOWTO <http://arc.liv.ac.uk/SGE/howto/idle.html> or
sources/experimental/qidle in the source repository, but it may
be better to check the the screensaver state.
USE_CGROUPS
[Linux only.] Use cgroups/cpusets for resource management if
the system supports them and the necessary directories exist in
the relevant filesystems (possibly created by
util/resources/scripts/setup-cgroups-etc). Makes
ENABLE_ADDGRP_KILL irrelevant. This option is experimental, and
at least the default is likely to change in future. Default is
no.
USE_SMAPS
[Linux only.] Read processes' smaps file in the proc(5)
filesystem to obtain PSS usage for most accurate memory
accounting, or to obtain the swap usage on older systems which
don't report PSS. That can be slow when processes have very
many maps (observed with an FEM code), significantly increasing
the load from execd, so the default is no. Without smaps, usage
is reported as RSS+swap, instead of PSS+swap, or simply as the
VMsize if the swap value isn't available.
print_malloc_info
See qmaster_params above.
DEMAND_LS
If true, generate load sensor reports just before sending them,
making the data fresher. The default is true. The switch is
provided in case slow sensors are found to have a bad effect on
the execd.
Changing execd_params will take effect after it is propagated to the
execution daemons. The propagation is done in one load report interval.
The default for execd_params is none.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
reporting_params
Used to define the behavior of reporting modules in the Grid Engine
qmaster. Changes to the reporting_params take immediate effect. The
following values are recognized:
accounting
If this parameter is set to true, the accounting file is
written. The accounting file is a prerequisite for qacct(1).
reporting
If this parameter is set to true, the reporting file is written.
The reporting file contains data that can be used for monitoring
and analysis, like job accounting, job log, host load and
consumables, queue status and consumables, and sharetree
configuration and usage. Attention: Depending on the size and
load of the cluster, the reporting file can become quite large.
Only activate the reporting file if you have a process running
that will consume the reporting file! See reporting(5) for
further information about the format and contents of the
reporting file.
flush_time
The contents of the reporting file are buffered in the Grid
Engine qmaster and flushed at a fixed interval. This interval
can be configured with the flush_time parameter. It is
specified as a time value in the format HH:MM:SS or a number of
seconds. Sensible values range from a few seconds to a minute.
Setting it too low may slow down the qmaster. Setting it too
high will make the qmaster consume large amounts of memory for
buffering data. The reporting file is opened and closed for
each flush. Default 15s.
accounting_flush_time
The contents of the accounting file are buffered in the Grid
Engine qmaster and flushed at a fixed interval. This interval
can be configured with the accounting_flush_time parameter. It
is specified as a time value in the format HH:MM:SS. Sensible
values range from a few seconds to one minute. Setting it too
low may slow down the qmaster. Setting it too high will make the
qmaster consume large amounts of memory for buffering data.
Setting it to 0 disables buffering; as soon as a record is
generated, it will be written to the accounting file. If this
parameter is not set, the accounting data flush interval will
default to the value of the flush_time parameter. The
accounting file is opened and closed for each flush.
joblog If this parameter is set to true, the reporting file will
contain job logging information. See reporting(5) for more
information about job logging.
sharelog
The Grid Engine qmaster can dump information about sharetree
configuration and use to the reporting file. The parameter
sharelog sets an interval in which sharetree information will be
dumped. It is set in the format HH:MM:SS or a number of
seconds. A value of 0 (default) configures qmaster not to dump
sharetree information. Intervals of several minutes up to hours
are sensible values for this parameter. See reporting(5) for
further information about sharelog.
log_consumables
This parameter controls writing of consumable resources to the
reporting file. When set to log_consumables=true information
about all consumable resources (their current usage and their
capacity) will be written to the reporting file, whenever a
consumable resource changes either in definition, or in
capacity, or when the usage of an arbitrary consumable resource
changes. When log_consumables is set to false (default), only
those variables will be written to the reporting file that are
configured in the report_variables in the exec host
configuration and whose definition or value actually changed.
This parameter is deprecated and will get removed in the next
major release. See host_conf(5) for further information about
report_variables.
finished_jobs
Note: Deprecated, may be removed in a future release.
Grid Engine stores a certain number of just finished jobs to provide
post mortem status information via qstat -s z. The finished_jobs
parameter defines the number of finished ("zombie") jobs stored. If
this maximum number is reached, the eldest finished job will be
discarded for every new job added to the finished job list. (The
zombie list is not spooled, and so will be lost by a qmaster re-start.)
Changing finished_jobs will take immediate effect. The default for
finished_jobs is 100.
This value is a global configuration parameter only. It cannot be
overwritten by the execution host local configuration.
qlogin_daemon
qlogin_command
rlogin_daemon
rlogin_command
rsh_daemon
rsh_command
These three pairs of entries are responsible for defining a remote
startup method for either interactive jobs by qlogin(1) or qrsh(1)
without a command, or an interactive qrsh(1) request with a command.
The last startup method is also used to startup tasks on a slave
exechost of a tightly integrated parallel job. Each pair for one
startup method must contain matching communication methods. All entries
can contain the value builtin (which is the default) or a full path to
a binary which should be used, and additional arguments to this command
if necessary.
The entries for the three ..._command definitions can, in addition,
contain the value NONE in case a particular startup method should be
disabled.
Changing any of these entries will take immediate effect.
The global configuration entries for these values may be overwritten by
a execution host local configuration.
See remote_startup(5) for a detailed explanation of these settings.
delegated_file_staging
This flag must be set to "true" when the prolog and epilog are ready
for delegated file staging, so that the DRMAA attribute
'drmaa_transfer_files' is supported. To establish delegated file
staging, use the variables beginning with "$fs_..." in prolog and
epilog to move the input, output and error files from one host to the
other. When this flag is set to "false", no file staging is available
for the DRMAA interface. File staging is currently implemented only via
the DRMAA interface. When an error occurs while moving the input,
output and error files, return error code 100 so that the error
handling mechanism can handle the error correctly. (See also
FORBID_APPERROR.)
reprioritize
Note: Deprecated, may be removed in future release.
This flag enables or disables the reprioritization of jobs based on
their ticket amount. The reprioritize_interval in sched_conf(5) takes
effect only if reprioritize is set to true. To turn off job
reprioritization, the reprioritize flag must be set to false and the
reprioritize_interval to 0, which is the default.
This value is a global configuration parameter only. It cannot be
overridden by the execution host local configuration.
jsv_url
This setting defines a server JSV instance which will be started and
triggered by the sge_qmaster(8) process. This JSV instance will be used
to verify job specifications of jobs before they are accepted and
stored in the internal master database. The global configuration entry
for this value cannot be overwritten by execution host local
configurations.
Find more details concerning JSV in jsv(1) and sge_request(1).
The syntax of the jsv_url is specified in sge_types(1).
jsv_allowed_mod
If there is a server JSV script defined with the jsv_url parameter,
then all qalter(1) or qmon(1) modification requests for jobs are
rejected by qmaster. With the jsv_allowed_mod parameter an
administrator has the possibility to allow a set of switches which can
then be used with clients to modify certain job attributes. The value
for this parameter has to be a comma-separated list of JSV job
parameter names as documented in qsub(1), or the value none to indicate
that no modification should be allowed. Please note that even if none
is specified, the switches -w and -t are allowed for qalter.
libjvm_path
libjvm_path is usually set during qmaster installation and points to
the absolute path of libjvm.so (or the corresponding library depending
on your architecture - e.g. /usr/java/jre/lib/i386/server/libjvm.so).
The referenced libjvm version must be at least 1.5. It is needed by
the JVM qmaster thread only. If the Java VM needs additional starting
parameters they can be set in additional_jvm_args. Whether the JVM
thread is started at all can be defined in the bootstrap(5) file. If
libjvm_path is empty, or an incorrect path, the JVM thread fails to
start.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
additional_jvm_args
additional_jvm_args is usually set during qmaster installation.
Details about possible values additional_jvm_args can be found in the
help output of the accompanying Java command. This setting is normally
not needed.
The global configuration entry for this value may be overwritten by the
execution host local configuration.
SECURITY
If prolog or epilog is specified with a user@ prefix, security
considerations apply. The methods are run in a user-supplied
environment (via -V or -v) which provides a mechanism to run arbitrary
code as user (which might well be root) by setting variables such as
LD_LIBRARY_PATH and LD_PRELOAD to affect the running of the dynamically
linked programs, such as shells, which are used to implement the
methods.
To combat this, known problematic variables are removed from the
environment before starting the methods other than as the job owner,
but this may not be foolproof on arbitrary systems with obscure
variables. The environment can be safely controlled by running the
methods under a statically-linked version of env(1), such as typically
available using busybox(1), for example. Use
/bin/busybox env -u ...
to unset sensitive variables, or
/bin/busybox env -i name=value...
to set only specific variables. On some systems, such as recent
Solaris, it is essentially impossible to build static binaries. In
that case it is typically possible to use a setuid wrapper, relying on
the dynamic linker to do the right thing. An example is the safe_exec
wrapper which is available from <http://arc.liv.ac.uk/downloads/SGE/
support/> at the time of writing. When using a non-shell scripting
language wrapper for the method daemon, try to use options which avoid
interpreter-specific environmental damage, such as Perl's -T and
Python's -E. Privileged shell script wrappers should be avoided if
possible, and should be written carefully if they are used - e.g.
invoke programs with full file names - but if bash(1) is used, it
should be run with the -p option.
It is not currently possible to specify the variables unset, e.g. as a
host-dependent execd parameter, but certain system-dependent ones are
selected. The list of sensitive variables is taken mostly from GNU
libc and sudo(1). It includes known system-dependent dynamic linker
ones, sensitive locale ones and others, like TMPDIR, but does not
attempt to deal interpreter-specific variables such as PYTHONPATH. The
locale specification is also sanitized. See the source file
source/libs/uti2/sge_execvlp.c for details. Note that TMPDIR is one of
the variables affected, and may need to be recreated (typically as
/tmp/$JOB_ID.$TASK_ID.$SGE_CELL).
SEE ALSO
sge_intro(1), csh(1), qconf(1), qsub(1), jsv(1), rsh(1), sh(1),
getpwnam(3), drmaa_attributes(3), queue_conf(5), sched_conf(5),
sge_types(1), sge_execd(8), sge_qmaster(8), sge_shepherd(8), cron(8),
remote_startup(5)
COPYRIGHT
See sge_intro(1) for a full statement of rights and permissions.
SGE 8.1.3pre 2011-11-27 SGE_CONF(5)
Man(1) output converted with
man2html