For a second time, the SGE Development Workshop brought together Grid Engine
users, developers and development partners. They came from sixteen organizations,
out of research and industry, in eight different countries. The attendees
participated in introductory sessions to current and future Grid Engine
developments. They had the opportunity to give an overview of their own
Grid Engine projects, and to learn about others' Grid interests, areas of
focus and results. Breakout sessions on several topics further stimulated
discussion of common interest. Additional Grid Engine Development Workshops
will be proposed on the project mail lists.
Table of Contents
Grid Engine Project Report and Workshop OutlineGrid Engine Partner Developments and Applications
Multi-Threading Approaches in Grid Engine 6.0
Grid Engine 6.0 Cluster Queues
Grid Engine 6.0 Throughput Scheduler and Scheduling Improvements
Utilizing Database in Grid Engine 6.0
Sun Grid Engine 6.0 Accounting / Reporting Tool
Status of the Grid Engine DRMAA Implementation
Reservation / Preemption / Backfilling in Grid Engine 6.0
Graph Optimization Algorithms for Grid Engine
Resource Discovery in Sun Grid Engine using JXTA
Impact of Sun Grid Solution Selling Initiatives
High Availability Grid with SGE and Sun Cluster
Plenary Discussion: Feedback and Comments for SGE 6.x
Challenges in building OGSA based Grids
Using Resources of Multiple Grids with Grid Service Provider
EPCC Sun Data and Compute Grids
Access to Shared Grid Resources in Heterogeneous Queuing Systems
Update on Sun CoE Activities at the University of Houston
The White Rose Grid: Experiences and “Wish List”
Integrating SGE and Globus in a heterogeneous HPC environment
ICENI: A next generation grid middleware
Integration of SGE into NPACI Rocks
Experiences at MSC Software in Applying Sun Grid Engine in a CAE Environment
Aspects of Processing a Grid
Scheduling and Job Management using Grid Engine on a Multi-Teraflop HPC
Confessions of a Grid Engine Addict
Sun Grid Engine at OSC
Interactive Batch jobs and AFS Token Handling Mechanism
Moving to a Secure Grid Portal
DRMMA SIG
Grid Engine Portal SIG
OGSA / OGSI SIG
Scheduler SIG
Afeldt, Stefan |
Alefeld, André |
Bablick, Ernst |
Beltrami, Riccardo |
Berger, Martin |
Bhardwaj, Aastha |
Cambruzzi, Sandro |
Cawood, Geoff |
Dalton, Terry |
Danesi, Silvio |
Davidson, Shannon |
Dörr, Andreas |
Edgecombe, Dr. Kenneth |
Ferstl, Fritz |
Furmento, Dr. Nathalie |
Gabler, Joachim |
Gentzsch, Dr. Wolfgang |
Grell, Stephan |
Haas, Andreas |
Hardie, Duncan |
Januszewski, Radoslaw |
Kosiedowski, Michal |
Lees, Peter |
Lobodzinski, Bogdan |
Lorenz Andrea |
Markov, Lev |
McBride, David |
Meier, Dr. Ulrich |
Mikolajczak, Rafal |
Nelson-Gal, David |
Newhouse, Steven |
Piwowarek, Pawel |
Raghunath, Priya |
Ryan, Paul |
Schwierskott, Andy |
Seed, Thomas |
Shirin, Gregory |
Sørensen, Dr. Søren-Aksel |
Street, Stefano |
Templeton, Daniel |
Tierney, Dr. Craig |
Turner, Aaron |
Youhanaie, Fredrick |
DAY 1 - Monday, September 22 - Sorat Hotel
Grid Engine 6.0 Development
09:30 |
Welcome |
Fritz Ferstl Sun Microsystems (SMI) |
09:40 |
Grid Engine Project Report and Workshop Outline |
Fritz Ferstl, SMI |
10:00 |
Multi-Threading Approaches in Grid Engine 6.0 |
Andreas Dörr, SMI |
10:40 |
Grid Engine 6.0 Cluster Queues |
Andreas Haas SMI |
11:10 |
Grid Engine 6.0 Throughput Scheduler and Scheduling Improvements |
Stephan Grell, SMI |
11:30 |
Break |
|
11:50 |
Utilizing Databases in Grid Engine 6.0 |
Joachim Gabler, SMI |
12:30 |
Sun Grid Engine 6.0 Accounting/Reporting Tool |
Andre Alefeld, SMI |
12:50 |
Lunch |
|
14:00 |
Status of the Grid Engine DRMAA Implementation |
Andreas Haas / |
14:40 |
Reservation/Preemption/Backfilling in Grid Engine 6.0 |
Andreas Haas, SMI |
15:00 |
Graph Optimization Algorithms for Grid Engine |
Lev Markov, SMI |
15:40 |
Break |
|
16:00 |
Resource Discovery in Sun Grid Engine using JXTA |
Dan Templeton, SMI |
16:30 |
Impact of Sun Grid Solution Selling Initiatives |
Glenn Wright, SMI |
17:00 |
(Tentative) High Availability Grid with SGE and Sun Cluster |
Sandro Cambruzzi, SMI |
17:20 |
Plenary Discussion: Feedback and Comments for SGE 6.x |
|
18:00 |
End of Day 1 |
|
19:00 |
Invited Dinner / Sorat |
|
DAY 2 - Tuesday, September 23 - Sorat Hotel
Grid Engine Partner Developments and Applications
09:15 |
Challenges in building OGSA based Grids |
Steven Newhouse, Imperial College London |
09:35 |
Using Resources of Multiple Grids with the Grid Service Provider |
Michal Kosiedowski, Poznan Supercomputing and Networking Center (PSNC) |
09:55 |
EPCC Sun Data and Compute Grids |
Geoff Cawood, Edinburgh Parallel Computing Centre |
10:15 |
Access to Shared Grid Resources in Heterogeneous Queuing Systems |
Pawel Piwowarek, PSNC |
10:35 |
Break |
|
10:50 |
Update on Sun CoE Activities at the University of Houston |
Priya Raghunath, |
11:10 |
The White Rose Grid: Experiences and "Wish List" |
Aaron Turner, University of York |
11:30 |
In tegrating SGE and Globus in a heterogeneous HPC environment |
David MCBride, Imperial college London |
11:50 |
ICENI: A next generation grid middleware |
Nathalie Furmento, Imperial College London |
12:10 |
Experiences on SGE RPMs and/or NPACI Rocks Presentation SCS Linux Competence Center |
Presenter |
12:10 |
Experiences at MSC Software in Applying Sun Grid Engine in a CAE Environment |
Stefan Afeldt, MSC Software |
12:30 |
Lunch |
|
13:30 |
Aspects of a Processing Grid |
Soren Sorensen, |
13:50 |
Scheduling and Job Management using Grid Engine on a Multi-Teraflop HPC |
Craig Tierney, High Performance Technologies Inc. |
14:10 |
Confessions of a Grid Engine Addict |
Shannon Davidson, Raytheon |
14:30 |
Sun Grid Engine at OSC |
Fred Youhanaie, Oxford Supercomputing Centre |
14:50 |
Interactive Batch Jobs and AFS Token Handling Mechanism |
Bogdan Lobodzinski, |
15:10 |
Moving to a Secure Grid Portal |
Kenneth Edgecombe, HPCVL |
15:30 |
End of Day 2 Presentations |
|
16:00 |
Guided City Tour: Also for those who have taken a tour through Regensburg before - there's always something new! |
DAY 3 - Wednesday, September 24 - Sun Microsystems Regensburg Offices
Special Interest Group Day
09:30 |
Build special interest groups; proposals: |
|
10:00 |
SIG Meetings |
|
12:00 |
Lunch |
|
13:00 |
SIG Result Presentations |
|
13:45 |
Plenary Discussion |
|
14:30 |
Adjourn |
|
By Fritz Ferstl, Sun Microsystems GmbH
By Andreas Dörr, Sun Microsystems GmbH
By Ernst Bablick & Andreas Haas, Sun Microsystems GmbH
By Stephan Grell, Sun Microsystems GmbH
By Joachim Gabler, Sun Microsystems GmbH
By André Alefeld, Sun Microsystems GmbH
By Andreas Haas and Dan Templeton, Sun Microsystems GmbH
By Andreas Haas, Sun Microsystems GmbH
By Lev Markov, Sun Microsystems Inc.
By Dan Templeton, Sun Microsystems GmbH
Impact of Sun Grid Solution Selling Initiatives
By Glenn Wright, Sun Microsystems Inc.
By Sandro Cambruzzi, Sun Microsystems GesmbH
By Steven Newhouse, Imperial College London
Abstract:
Key
to wide-scale adoption of the Grid is to provide an infrastructure which
can be used to build viable business models. The UK's Markets for Computational
Services (MCS) project is working to build such an infrastructure using the
Open Grid Services Architecture.
The presentation reports on our recent implementation experiences in producing the first prototype
using OGSI.
By Michal Kosiedowski, Poznan Supercomputing and Networking Center (PSNC)
Abstract:
Recently,
there have emerged numerous grid installations and systems around the globe.
Most of them are equipped with a graphical user interface, preferably a web
based one, to submit and execute computing jobs. Yet, users of these multiple
grids must switch from one computing portal to another to utilize resources
that are available to them. It requires a big effort to migrate a job description
from one grid to another.
Poznan Supercomputing and Networking Center proposes a new mechanism to
enable resources of multiple grids within a user interface. The PROGRESS grid
service provider, which provides grid services sharable between distributed
computing portals and other grid user interfaces, is set to be equipped with
a grid resource broker plug-in solution. The grid resource broker plug-ins
will provide mechanisms for communication with a particular grid system, e.g.
PROGRESS grid resource broker, Globus, GridLab, etc. They will serve as gateways
to the grids whenever there's a requirement to submit a computing job to
the grid and enable job descriptions migration. In this presentation the
overall functionality of the PROGRESS grid service provider is delivered and
the idea of the grid resource broker plug-in mechanism is introduced.
By Geoff Cawood, Edinburgh Parallel Computing Centre
Abstract:
Authors: Geoff Cawood, Ratna Abrol, Thomas Seed, Terry Sloan
The
SunDCG project aims to develop a compute and data scheduler based around
Grid Engine, Globus and a variety of data technologies.
Last year's Grid Engine Workshop provided helpful feedback on some proposed
technical strategies. This year, our presentation will describe the resulting
software, and our plans for a further release when the project ends in January
2004. We hope to share some of our experiences of integrating Grid Engine
and Globus, and seek any technical insights that could ease the final development
phase.
By Pawel Piwowarek, Poznan Supercomputing and Networking Center (PSNC)
Abstract:
Authors:
Pawel Piwowarek, Marek Zawadzki
PSNC is a leading provider of computational power in Poland. To deal with
extensive HPC demands, all requests are served via heterogeneous queuing
systems such as LSF, LL, PBS, etc. which help sharing resources among users.
With the advent of PROGRESS project in 2002 PSNC gained another grid-like
infrastructure which utilize SGE.
This talk presents our solution of integrating SGE deployed in PROGRESS into
our production environment based on the LSF platform.
By Priya Raghunath, University of Houston
Abstract:
As a Sun Center of Excellence in Geo sciences, University of Houston (UH) has been actively pursuing research and development in a grid environment. This presentation will focus on the development of the EZ-Grid system at the UH that provides a generic interface to access grid resources. Having undergone design changes the new version of EZ-Grid is implemented using the latest technologies and provides an enhanced set of features. We will also present a brief overview of the job manager we have developed to enable interaction between Globus and SGE. In addition to these we will briefly describe efforts underway to develop strategies for increased fault-tolerance without compromising on performance and also proposed tools to enable prediction of average wait times for jobs.
By Aaron Turner, University
of York
Abstract:
A
brief outline of the experience at the York White Rose Grid node, with an
outline of the set up used, and some problems encountered. Based on this experience
a series of things that it would be good to see in release 6.0 are discussed.
By David McBride, Imperial College London
Abstract:
This
talk describes the integration of Sun Grid Engine and the Globus Toolkit
on the London e-Science Centre's computational resources.
It presents an overview of the the architectural implementation details
of these systems and also presents some of the challenges encountered when
trying to deploy our
solution on production systems.
By Nathalie Furmento, Imperial College London
Abstract:
This talk presents how ICENI, the Grid Middleware developed in the London e-Science Centre, and SUN products such as uPortal, SGE or Netbeans, have been integrated to provide solution to applied scientists in delivering them access to Grid computing resources. It also presents the implementations of the ICENI's Service-Oriented Architecture and the Semantic Framework attached to it.
By Najib Ninaba, Scalable Systems Pte Ltd., SCS Linux Competence Center
(Presented by Andy Schwierskott, Sun Microsystems GmbH)
By Stefan Afeldt and Stefan Mayer, MSC Software
Abstract:
During
the past years, workload management has become a very important topic in
the CAE (Computer Aided Engineering) environment. In June 2003 Sun announced
MSC.Software's plans to market, implement and support Sun Grid Engine and
to offer associated services to help its manufacturing customers worldwide
reduce the time and costs associated with product development. In our presentation,
we describe MSC.Software's recent work around Sun Grid Engine. For example,
MSC.Software has implemented an interface to Sun Grid Engine in its MSC.BatchSubmit
offering. This tool offers a web-based interface between users of CAE software
and Grid Engine, so that engineers can easily access CAE applications in
a compute farm from any web browser on any workstation platform, independent
of the workstation architecture. In addition, MSC.Software developed Grid
Engine administration modules for the popular Webmin environment, so that
system administrators can easily set up and configure a Grid Engine installation
from a web browser Based on this development work, MSC.Software has implemented
Sun Grid Engine based solutions both in multiple customer service projects
and at MSC. Software's European headquarters in Munich, where 200 MSC engineers
have access to a compute
farm of more than 100 CPUs.
By Søren Sørensen, University College London
By Craig Tierney, High Performance Technology Inc.
By Shannon Davidson, Raytheon
Abstract:
Grid Engine Enterprise Edition is used at
several HPC sites managed by Raytheon. An overview at how GEEE is used at
these sites will be presented. A Myrinet / MPICH GEEE integration is presented
and a Grid demo for SC2003 is described.
By Fred Youhanaie, Oxford Supercomputing Centre
Abstract:
Sun
Grid Engine was first installed at Oxford Supercomputing Centre in March,
2002. Since then the installation has evolved from the default configuration
to one that handles multiple clusters with multiple Parallel Environments,
with the ability to accept jobs submitted through the Globus middleware.
This talk will chart our experiences with the Grid Engine software over the past 18 months.
By Bogdan Lobodzinski, DESY Zeuthen
Abstract:
After an introduction about the usage of the SGEEE batch system at DESY Zeuthen we present the realization of interactive jobs managed by SGEEE on hosts without direct login access. In addition the AFS token handling mechanisms is covered: the current implementation, further developments and problems are discussed.
By Kenneth Edgecombe, High Performance Computing Virtual Laboratory, Kingston, Can.
Discussion points were:
General facilitation of Grid adoption
Particular facilitation for ISVs
1 API for many DRMs
Offer of self contained solutions
DRM adoption
ISV adoption
Ease of use
Eventually a certification (ISV, DRM)
SGE
PBS
IBM Loadleveler
United Devices
Entropia
Cadence (ISV)
GridIron
Grid Solutions
Generic integration to different types of functions (Condor, LSF, SGE, ...)
Finding alpha testers for SGE implementation
Promotion of standards within Sun
Promotion of choices
The integration of GEP into a Security Infrastructure like Entrust have
been discussed and this is a project underway at HPCVL.
The problem here are the different layers where security is deployed by
the different components of the whole system, which is rather complicated
and only some aspects could be discussed.
The second topic that had been covered was the need for easier addition of applications and how to describe them in a generic way.
One possible solution would the definition of an application definition language and a corresponding tool to allow the integration of any application into a portal and to automatically generate the needed configuration dialogues.
Three kinds of users would be useful.
Users that administrate GEP and add new applications and define which users can access it.
Special expert users that are doing tuning for special applications and find out which parameter sets are best suited, this can be a longer running effort.
Users that are only deploying the preconfigured application delivering only a minimal set of input and data.
Discussion points were:
What are OGSA/OGSI?
Why are they useful?
Should SGE support OGSA?
Should SGE support OGSA?
OGSA/OGSI are nice, but functionality is more useful than standards.
If SGE had remote monitoring/controlling features, most people would be happy.
XML output from SGE command line utils would be a huge step in the right direction.
An OGSI interface would be more useful than OGSA support.
OGSI binding for DRMAA a start, but need broader functionality.
There are two basic needs:
Hierarchical grid site structure
Two-tier grid site structure with central meta-scheduler
It is possible to do both without OGSA/OGSI
Look into XML output for command line utilities
Work towards OGSI binding for DRMAA Goals
Enable meta-scheduling in SGE
The meeting was divided into three parts:
Usually some large, more medium, and most short running jobs
Short running: 10 to 15 min (usually not parallel jobs)
Long running : over 8 hours ( usually massive parallel jobs)
Each combined with different sets of resource requests and dead lines
In some environments have jobs a max runtime of 24 hours. They have to restart themselves, if they need longer. This is used to prevent starvation of jobs.
Most grids are build with SGE and not with SGEEE.
General feedback was positive. Most have some additional scripts to handle specific requirements and things SGE cannot do (such as: a pre scheduler, ...)
Some wishes for enhancements were:
Better license management for parallel jobs with consumables
A way to limit the number of jobs per host.
A better way to specify job dependencies
A short summery of Lev Markov's and Andreas Haas' presentation with the focus on job starvation, resources and back filling.
Dear Conference Participants, Dear Colleagues,
I want to thank everybody who participated and contributed to our Second International Workshop on Grid Engine Technologies, September 22 -24, 2003, in Regensburg, and made it a real success.
With 50 participants, 30 experts presentations, and several Workshops on the last day, the Conference was certainly packed, interesting, and very interactive. Besides presentations about new Grid Engine core technology, there was also a great collection of work around Grid Engine, e.g. Globus, OGSA, Data Grids, Cluster, Campus and Global Grids, Advanced Scheduling, Security, and more.
I would like to especially thank our esteemed technology partners who provided
a lot of important input to Grid Engine, thus helping us to develop an advanced
and competitive next-generation DRM technology.
Final, special thanks to Monika Grobecker and Fritz Ferstl, who again organized
the Workshop and contributed to a very successful event.
I hope to meet you all at the next Regensburg Workshop :-)
Kind Regards
Wolfgang Gentzsch
Director Grid Computing, Sun Microsystem