Different Resource Management Strategies with Grid Engine



This HOWTO discusses the PROs and CONs of different approaches for managing resources. It originates from a response to the question "How can I minimize the delay of a load sensor tracking software licenses?" asked in the Sun SupportForum. The answer was:

It's quite natural that load sensors have a delay. It can't be prevented and for this reason using load sensors is only the 3rd best approach to manage license consumption.

  • The best way for managing licenses is the use of consumable resources (CR). Floating licenses can easily be managed with a global CR. Node-locked licenses can be managed in an analogous fashion. If you don't consider interactive use of your licenses you usually need only CRs and don't have to bother about load sensors delay.

  • If you need the licenses also for your interactive jobs we suggest the use of

    qrsh <resource_request> appl

    to achieve also interactive license consumption being gathered by Grid Engine. To make the use of qrsh invisible for your users the qrsh command can be either put in a script wrapper behaving like the original application, or you can use qtcsh to achieve transparency.

  • If it is not practicable for you to start interactive license-bound jobs through Grid Engine you can use the consumable resource setup as described above in combination with your load sensor for the same resource attribute. Grid Engine uses both information sources and does its best to derive from this how much of the resource is really available. Unfortunately, due to the loadsensor's delay, it can't be 100% excluded that batch jobs are dispatched and started although the license has been aquired by an interactive job. In this situation, however, batch jobs can react by explicitly triggering a rerun returning 99 as an exit status (collision detection). Load correction can sometimes help to reduce the number of reruns but it is only a solution, if you have an almost homogenous job profile.