Default (In)security
A default Grid Engine installation (without CSP or MUNGE) is highly insecure, and demands trusting all users who have access to it. For instance, any exec host can be owned using something like:[1]
$ fakeroot qrsh id -u 0 $
That can be defeated for root by setting uidmin
in
sge_conf(5)
.
However, any user with access to an admin host (e.g. with qmaster
running on the login node) can just run
fakeroot qconf -sconf
to change it. In that case, if they can write executables to a file
system visible from the qmaster, they can also own the qmaster, say by
configuring mailer
or jsv_url
.
Thus, to protect exec hosts and the qmaster with default “security”, it is at least necessary to set uidmin/gidmin and run the qmaster on a separate host with no access to non-admin users.[2] (If that can’t be done, it might be possible to restrict access to the qmaster socket, perhaps with iptables’s owner module.)
With admin host restrictions and uid limits in place, it is still
possible to submit jobs as any allowed user with qsub and an
LD_PRELOAD
trick, as with fakeroot
or otherwise — with a simple DRMAA
client, for instance.
In most environments you will want to use either CSP or MUNGE, as below.
CSP Security
CSP security is the original method. It prevents job submission as another user and configuration changes by a non-admin/operator user. Thus it does allow admin hosts to be more safely used also as submission hosts if required. It also secures the daemon communication channels, but that will usually be a secondary consideration.[3]
There are some limitations of using CSP:
-
Users must be explicitly added (no
enforce_user true
) and have keys generated for them (util/sgeCA/sge_ca -usercert
); -
The keys must be distributed to the relevant hosts, though you can have selective authorization of users by submission host according to which keys are distributed where;
-
Keys must be renewed (using
util/sgeCA/renew_all_certs.sh
) after the set expiry time (a year by default) and redistributed; -
Currently (SGE 8.1.2) only a single security method can be configured, so using CSP excludes AFS support, for instance (see below).
It is not necessary to re-install to turn on CSP, just:
-
Stop all the SGE daemons;
-
Edit
security_mode
in thecommon/bootstrap
file; -
Generate certificates with
util/sgeCA/sge_ca -init
andutil/sgeCA/sge_ca -usercert
; -
Distribute the certificates as appropriate;
-
Restart the daemons.
Note that CSP isn’t really a public key system as used for https, but is
basically relying on shared secrets distributed within a single
administrative domain. The security of a user’s key is dependent on the
security of all hosts to which it is distributed — a privileged user on a
submit host can impersonate any user whose keys are on that host.
Typically keys will only be distributed en masse to submit hosts which
are secure login nodes. Users can copy their own keys to another submit
host, such as a personal workstation (with sge_ca -copy
, assuming the
home directory is secure).
See sge_ca(8)
for more usage information. The real security of CSP is unclear;
there is no known audit of it.
MUNGE Authentication
Authentication with MUNGE was introduced in SGE 8.1.9, and may be most convenient in an HPC cluster. However, it has not been well tested at the time of writing. It is probably more convenient than CSP since it only requires a secret shared by daemons running on each host. It also allows operation with enforce_user=auto. However, it provides authentication, not encryption of of the communication channels, and is probably only appropriate in a tightly-coupled security domain like and HPC cluster.
To use it, SGE must be built against the MUNGE library, e.g. the
GNU/Linux packaged versions. Then MUNGE must be set up (see the
installation guide),
on all the SGE hosts, i.e. with the daemon running against
the shared key for the cluster. Then the SGE daemons can be started
with munge
configured as the
security_mode.
Other Methods
The only other security method which currently (SGE 8.1.9) works
properly is the afs
one. However, as implemented, it doesn’t provide
authentication of users submitting jobs. Without that it is possible to
submit a job as another user (as above) and steal credentials from
another job of that user running on the same host, so that it could
actually facilitate security breaches. It may be possible to use
AUKS with CSP as an
alternative, but there’s currently no setup recipe published.
The
GSSAPI
(kerberos
/dce
) method would work for authenticating job submission
and passing (but not renewing) Kerberos tickets, but the mechanism for
calling the sub-programs involved is partially broken and needs
re-implementing.
Some largely historical information on security is available. The security framework should be re-done with hooks to allow arbitrary, composable methods.
External Program Hooks
SGE no longer runs external remote startup programs (see
remote_startup(5)
)
in the user-defined environment, and so is not vulnerable to that part
of CVE-2012-0208 (see the
SGE source).
Other published responses to the CVE pass the
environment with some sanitation, but fail to remove all the known
sensitive variables.
Although the environment passed to other external methods, such as the
prolog, is sanitized when they are invoked with privileges (user@…),
the sanitization may not be foolproof. See the SECURITY section of
sge_conf(5)
.
Obviously these concerns are moot without restrictions imposed by the
uidmin
limit or user authentication, as above.
Copyright © 2012, 2013, 2016 Dave Love, University of Liverpool
Licence GFDL.