Tuning guide

Grid Engine is a full function, general purpose Distributed Resource Management (DRM) tool. The scheduler component in Grid Engine supports a wide range of different compute farm scenarios. To get the maximum performance from your compute environment it can be worthwhile to review which features are enabled and which are really needed to solve your load management problem. Disabling/Enabling these features can have a performance benefit on the throughput of your cluster. Each feature contains in parentheses when it was introduced. If not otherwise stated, it is available in higher versions as well.

overall cluster tuning

Experience has shown utilization of NFS or similar shared file systems for distributing files required by Grid Engine can have a critical share in both overall network load and file server load. Thus keeping such files locally is always at least slightly beneficial for overall cluster throughput, but at the cost of easier monitoring/debugging which may not be a good trade-off in low-throughput cases. The HOWTO Reducing and Eliminating NFS usage by Grid Engine. shows different common choices for accomplishing this.
scheduler monitoring

Scheduler monitoring can be helpful to find out the reason why certain jobs are not dispatched (displayed via qstat). However, providing this information for all jobs at any time can be resource consuming (memory and CPU time) and is usually not needed. To disable scheduler monitoring set schedd_job_info to false in scheduler configuration sched_conf(5).
finished jobs

In case of array jobs the finished job list in qmaster can become quite big. Switching it off will save memory and speed up qstat commands because qstat also fetches the finished jobs list. Set finished_jobs to 0 in global configuration. See sge_conf(5).
job verification

Forcing validation at job submission time can be a valuable tool to prevent non-dispatchable jobs from remaining in pending state forever. However, It can be time consuming to validate jobs, especially in heterogeneous environments with a variety of different execution nodes and consumable resources and where every user has his own job profile. In homogeneous environments with only a couple of different jobs, a general job validation usually can be omitted. Job verification is disabled per default and should only be used (qsub(1): -w [v|e|w]) when needed. [It is enabled by default with DRMAA.]
load thresholds and suspend thresholds

Load thresholds are needed if you deliberately oversubscribe your machines, and you need a mechanism to prevent excessive system load. Suspend thresholds are also used for this. The other case in which load thresholds are needed is when the execution node is open for interactive load which is not under control of Grid Engine, and you want to prevent the node from being overloaded. If a compute farm is more single-purpose, e.g., each CPU at a compute node is represented by only one queue slot, and no interactive load is expected at these nodes, then load_thresholds can be omitted. To disable both thresholds set load_thresholds to none and suspend_thresholds to none. See queue_conf(5).

load_thresholds are applicable to consumable resources as well (see queue_conf(5)). Using this feature will have a negative impact on the scheduler performance.

load adjustments

Load adjustments are used to increase virtually the measured load after a job has been dispatched. This mechanism is helpful in the case of oversubscribed machines in order to align with load thresholds. Load adjustments should be switched off if they are not needed, because they impose on the scheduler some additional work in connection sorting hosts and load thresholds verification. To disable load adjustments set job_load_adjustments to none and load_adjustment_decay_time to 0 in the scheduler configuration. See sched_conf(5).
scheduling-on-demand

The default for Grid Engine is to start scheduling runs in a fixed scheduling interval (see schedule_interval in sched_conf(5)). The good thing with fixed intervals is that they limit the CPU time consumption of the qmaster/scheduler. The bad thing is that they throttle the scheduler artificially, resulting in a limited throughput. In many compute farms there are machines specifically dedicated to qmaster/scheduler and in such setups there is no reason for throttling the scheduler. How many seconds one should use for flush times is difficult to say. It depends on the time the scheduler needs for a single run and the number of jobs in the system. A couple test runs with the scheduler profiling (Add profile=1 to the params in the sched_conf(5).) should give one enough data to select a good value.

Scheduling-on-demand can be configured using the FLUSH_SUBMIT_SEC and FLUSH_FINISH_SEC settings in the sched_conf(5). If it is activated, the throughput of a compute farm is only limited by the power of the machine hosting qmaster/scheduler.

scheduler priority information

qstat -ext

-urg

-pri

report_pjob_tickets

false

sched_conf(5)

policies

sge_priority(5)

ticket policy
urgency policy
POSIX priority policy
deadline policy
waiting time policy

weighting factor

sched_conf(5)

resource reservation

max_reservation

sched_conf(5)

max_reservation

sched_conf(5)

optimization of qmaster memory consumption

-v variable_list

-V

use "-b y" to unburden qmaster

qsub

-b y

job filter based on job classes

JC_FILTER=1

sched_conf(5)

problems in the system

"qstat -ext"

"qstat -j "

Scheduler profiles, such as are used during Grid Engine installation, can be stored using "qconf -ssconf >file". The profiles are not stored internally. With the combination of dynamically changing the scheduler configuration by loading a new profile with "qconf -Msconf <file>" and a cron job, one can switch to a leaner configuration over night and return to a user friendly configuration during the day.