GRID ENGINE WORKSHOP
April 22 - 24, 2002
Sun Microsystems Gridware GmbH
The first SGE Development Workshop brought together Grid Engine users, developers and development partners from sixteen organizations, out of research and industry, in six different countries. The attendees participated in introductory sessions to the Grid Engine project and a code walk through, had the opportunity to give an overview on their own Grid Engine projects, and to learn about others' Grid interests, foci and results. Breakout sessions on several topics further stimulated discussion of common interests. Additional Grid Engine Development Workshops will be proposed on the project mail lists.Dr.-Leo-Ritter-Straße 7 D-93049 Regensburg, Germany
CONTENTS
Sun Grid Computing Strategy and Projects
Sun Grid Computing Strategy
The Grid Engine Project
Working in the Grid Engine Project Part I
Working in the Grid Engine Project Part II
Partner Grid Engine ProjectsSpecial Interest Group Meetings
Workstation-Cluster at Ford of Europe
Integration, Support and Development Projects Related to Grid Engine
Using Sun Grid Engine and Globus to Schedule Across a Combination of Local and Remote Machines
GridLab and Progress as the Examples of Cooperation between SGE and Grid Research at PSNC
Secure Grid and Portal Computing
Sun TCP Bioinformatics Application Integration on Heterogenous Server Platforms
Portals and Resource Scheduling at Imperial College
The White Rose Computational Grid
Scheduling in an HPC environment
Covering the Spectrum Grid Activities@UCL
Overview of Sun Center of Excellence in Geosciences at the University of Houston
Anjomshoaa, Ali
University of Edinburgh
Tel: +44 (0)131 650 6717
|
Alefeld, André
Sun Microsystems Gridware
Tel: +49 (0)941 3075 255
|
Barr, John
Sun Microsystems LTD.
Tel: +44 (0)1252 421157 Fax: +44 (0)1252 420105
|
Bablick, Ernst
Sun Microsystems Gridware
Tel: +49 (0)941 3075 135
|
Cafaro, Prof. Massimo
Università degli Studi di Lecce
Tel: +39 0832 320 284
|
Cawood, Geoff
University of Edinburgh
Tel: +44 (0)131 650 5818
|
Dew, Prof. Peter M.
University of Leeds
Tel: +44 (0)113 233
|
Edgecombe, Dr. Kenneth
Queen's University
141 Collingwood Street Kingston, ON K7L 3X6 Canada
|
Ferstl, Fritz
Sun Microsystems Gridware
Tel: +49 (0)941 3075 110
|
Furmento, Dr. Nathalie
London E-Science Centre
Tel: +44 (0)207-594-8310
|
Gabler, Joachim
Sun Microsystems Gridware
Tel: +49 (0)941 3075 233
|
Gentzsch, Dr. Wolfgang
Sun Microsystems Inc.
Tel: +1 650 786 2032
|
Haas, Andreas
Sun Microsystems Gridware
Tel: +49 (0)941 3075 131
|
Ismail, Dr. Mathew
University of Warwick
Tel: +44 (0)2476 574100
|
Kirstein, Prof. Dr. Peter
University College London
Tel: +44 (0)171 380 7285
|
Kloock, Martin
cards Engineering GmbH&Co. KG
Tel: +49 (0)221 179520
|
Kupczyk, Miroslaw, M.Sc.;
Poznan Supercomputing and Networking Center
Tel: +48 61 858 20 52
|
Kurowski, Krzysztof, M.Sc.
Poznan Supercomputing and Networking Center
Tel: +48 61 858 20 72
|
Lippert, Lothar
Sun Microsystems Gridware
Tel: +49 (0)941 3075 123
|
Lorenz, Andrea
Aachen University of Technology
Tel: +49 (0)241 80-29791
|
Newhouse, Dr. Steven
Imperial College
Tel: +44 (0)20 7594 8316
|
Novotny, Jason
Max-Planck-Institut für Gravitationsphysik
Tel: +49 (0)331 567 7203
|
Piontek, Tomasz
Poznan Supercomputing and Networking Center
Tel: +48 61 858 20 72
|
Piwowarek, Pawl
Poznan Supercomputing and Networking Center
Tel: +48 61 858 20 72
|
Reissmann, Christian
Sun Microsystems Gridware
Tel: +49 (0)941 3075 112
|
Saleem, Asif
Imperial College
Tel: +44 (0)20 7594 8316
|
Schmidt, Egon
cards Engineering GmbH&Co. KG
Tel: +49 (0)221 179520
|
Schmidt, Dr. Joanna
University of Leeds
Tel: 0113 233 5375
|
Seed, Thomas
University of Edinburgh
Tel: +44 (0)131 650 5818
|
Sloan, Terry
University of Edinburgh
Tel: +44 (0)131 650 5818
|
Sørensen, Dr. Søren-Aksel
University College of London
Tel: +44 (0)171 380 7285
|
Spormann, Annett
cards Engineering GmbH&Co. KG
Tel: +49 (0)221 179520
|
Sundaram, Babu
University of Houston
|
Stair, Craig R.
Raytheon
Tel: +1 972 205 7677
|
Stahlberg, Eric, Ph.D.
Ohio Supercomputer Center
Tel: +1 614 292 2696
|
Tollefsrud, John
Sun Microsystems Inc.
Tel: +1 650 786 2037
|
Wehrens, Oliver
Max-Planck-Institut für Gravitationsphysik
Tel: +49 (0)331 567 7203
|
Day 1 - Monday,
April 22
09:00
|
Welcome
(Fritz Ferstl)
|
09:15
|
Sun
Grid Computing Strategy (Wolfgang Gentzsch)
|
10:15
|
Grid
Engine, Overview and Roadmap (Fritz Ferstl)
|
11:00
|
Break
|
11:30
|
The
Grid Engine Project (Fritz Ferstl)
|
12:00
|
Lunch
|
13:15
|
Working
in the Grid Engine Project - Part I (André Alefeld)
|
14:45
|
Break
|
15:15
|
Working
in the Grid Engine Project - Part II (Andreas Haas)
|
17:00
|
Adjourn
|
19:00
|
Dinner
Invitation ("Leerer Beutel" - Bertold Straße - see map)
|
Day 2 - Tuesday, April 23
09:00
|
Partner
Grid Engine Projects
|
Integration
and Support for Grid Engine
Workstation Cluster at
Ford of Europe
Martin Kloock, Cards Engineering Craig Stair, Raytheon Integration, Support and Development Projects Related to Grid Engine |
|
Grid
Computing (Globus etc.) & Grid Engine
|
|
Geoff
Cawood, EPCC
Using Sun Grid Engine and Globus to Schedule Across a Combination of Local and Remote Machines Krzysztof Kurowski, PSNC "Gridlab" and "Progress" as the Examples of Cooperation Between SGE and Grid Research at PSNC |
|
10:30
|
Break
|
11:00
|
Partner
Grid Engine Projects cont'd
|
Grid
Engine and Portals
Kenneth Edgecombe, HPCVL Secure
Grid and Portal Computing
Eric Stahlberg, OSC Sun
TCP Bioinformatics Application Integration on Heterogeneous Server Platforms
Steven Newhouse, Imperial College Portals and Resource Scheduling at Imperial College P. M. Dew, Univ. of Leeds The White Rose Computational Grid |
12:30
|
Lunch
|
13:30
|
Partner
Grid Engine Projects cont'd
|
Scheduler
Enhancements for Grid Engine and other Topics
Andrea Lorenz, RWTH Aachen Scheduling
in an HPC Environment
Soren-Aksel Sorensen, UCL-CS Covering
the Spectrum - UCL-CS Grid Related Activities
Babu Sundaram, University of Houston Overview
of set up of the Campus Grid, EZ-Grid system at UH
|
|
14:30
|
Adjourn
|
16:00
|
Walk
through Regensburg
|
17:30
|
At
people's disposal
|
Day 3 - Wednesday, April 24
09:00 Project Wrap Up; Build Special Interest Groups
09:30 Special Interest Group Meetings
11:00 Break
11:30 Special Interest Group Meetings cont'd
12:30 Lunch
14:00 Presentation of SIGs
15:00 Roundtable discussion
15:45 Closing Remarks (Fritz Ferstl)
16:00
End
Sun Grid Computing Strategy
By Wolfgang Gentzsch, Sun Microsystems Inc.
The Grid Engine Project
By Fritz Ferstl, Sun Microsystems GridwareWorking in the Grid Engine Project: Part I
By André Alefeld, Sun Microsystems GridwareWorking in the Grid Engine Project: Part II
By Andreas Haas, Sun Microsystems Gridware
Workstation-Cluster at Ford of Europe
By Martin Kloock, Cards EngineeringAbstract:
In August 1998 cards Engineering was able to place an on-site support contract at Ford Motor Company in Cologne/Merkenich. I started there maintaining the CAE (Computer Aided Engineering) application server. About 50 CAE applications are installed on their server which is a SUN Ultra Enterprise 450. My activity covered the installation of the applications, 1st- and 2nd-level helpdesk and the license maintenance of all these products. Additionally I had to collaborate very closely with the Ford colleagues in Dunton/UK and Dearborn/USA.During this time some Ford people thought about a better usage of their workstations (w/s), mainly HPs and altogether about 800 w/s. Additionally they thought they could decrease the costs the Cray supercomputer was producing.
About at the same time SUN acquired Gridware and we saw a good opportunity to help Ford with introducing a resource management system. With our close connections to SUN we were able to introduce SGE at Ford and to create a workstation cluster in two development departments at Ford. After one of my colleagues from the Cologne cards PIT team took over my job maintaining the CAE server I was able to work on the w/s cluster based on a new contract.
In these months from summer 2001 until April 2002 SGE was installed on about 80 w/s and adapted 8 CAE applications (i.e. Radioss, Nastran, Fluent) for the workstation cluster for using them in serial (1 CPU or w/s) and/or parallel mode. This had to be done by changing startup scripts, GUI configurations and creating submit scripts. Additionally I configured the cluster due to user's wishes.
With the help of some SUN guys we were able to convince the responsible persons at Ford of Europe that SGE is a good product which does exactly what they requested. Unfortunately we were not able to convince the Ford of US people completely. Over there LSF and PBS, two competitive products, are very strong and already used before we began with SGE. Nevertheless it was possible to install a productive w/s cluster in Cologne and Dunton/UK and to satisfy the users concerning their needs.
Integration, Support and Development Projects Related to Grid Engine
By Craig Stair, RaytheonAbstract:
Raytheon’s High Performance Computing(HPC) team has been engaged in large-scale supercomputer-based systems development, deployment, operation, and maintenance for 30 years. The team works continuously to maintain its comprehensive knowledge of IBM, SGI, SUN, Compaq, HP, Cray/NEC, and end-to-end storage solutions/file systems. Raytheon’s in-house technical strengths for high-end systems development includes systems architecture engineering, system performance engineering, modeling, applications performance engineering, facilities engineering and implementation.
Raytheon’s presentation at the GridEngine Developers’ Workshop will give an overview of Raytheon’s capabilities in HPC and well as our involvement with GridEngine Enterprise Edition. Raytheon has been involved in the development of this software since its inception and provided many of the key code components. Raytheon processes unique capability in deploying and providing support for GridEngine Enterprise Edition.Presentation currently not available.
Using Sun Grid Engine and Globus to Schedule Across a Combination of Local and Remote Machines
By Geoff Cawood and Paul Graham, Edinburgh Parallel Computing Centre, EPCCAbstract:
The aim of this collaboration between EPCC and Sun is to produce a job scheduler (based on SGE) which can submit jobs to both local and remote machines. Globus will provide a secure means of running jobs on remote sites.
The project is still at an early fact-finding stage so this talk will compare some potential solution strategies and invite comments from the floor on the technical issues arising.GridLab and Progress as the Examples of Cooperation between SGE and Grid Research at PSNC
By Krzysztof Kurowski, Poznan Supercomputing and Networking Center, PSNCAbstract:
Grid research and development at Poznan Supercomputing and Networking Center (PSNC) will be presented during the SGE Workshop. Amongst different projects the presentation will be focused on GridLab and Progress projects and their relationships to SGE.
Two important aspects of Grid technology, which have been largely ignored, form the basis of the GridLab Project, which aims to build components for Grid applications, and realistic testbeds for their development: Co-development of Infrastructure and Applications and Dynamic Grid Computing.
In case of Progress project, scientific portal and Grid environment for SUN Servers will be developed based on SUN technologies. The main goal of this project is to provide reliable and easy access to the Grid as well as support research community with many useful tools and services.
Secure Grid and Portal Computing
By Kenneth Edgecombe, HPCVLAbstract:
HPCVL was formed by a consortium of four universities (Carleton University, Queen's University, the Royal Military College of Canada, and the University of Ottawa) to provide a secure innovative High Performance Computing (HPC) environment for researchers. This environment is now being built with contributions from the four member institutions, the Canada Foundation for Innovation, the Ontario Innovation Trust, the Ontario Research and Development Fund, Sun Microsystems, IBM, and Entrust.
The progress being made, the roadmap for future developments, and the acceptance by researchers will be reviewed. The partnership with Sun Microsystems will be highlighted and progress on the Centre for Secure Grid and Portal Computing discussed.Sun TCP Bioinformatics Application Integration on Heterogenous Server Platforms
By Eric Stahlberg, Ohio Supercomputer Center, OSCAbstract:
A brief overview of efforts made to integrate applications of running on a variety of server architectures without using SGE will be presented. Proof-of-concept integration approaches for using the Technical Compute Portal with Time Logic DeCypher bioinformatics algorithm accelerators, Cray and SGI origin systems will be discussed. Issues and potential problems associated with the implementations will be highlighted.
Portals and Resource Scheduling at Imperial College
By Steven Newhouse, Imperial College, LondonAbstract:
The London e-Science Centre at Imperial College, one of the eight Regional Centres within the UK e-Science program, is involved in the construction of Grid infrastructures within the College and in the UK through the use of SGE, Globus and its own grid middleware - ICENI.
As part of the Centre's activities, and as a Sun Centre of Excellence, we have been developing middleware to support an e-Science Portal at the College (based on uPortal), the use of the Technical Compute Portal to access Sun Grid Engine through the uPortal infrastructure, and the integration of scheduling infrastructures (such as SGE) through a web service enabled DRMAA interface.
The presentation will comprise an overview of this work and its future directions.The White Rose Computational Grid
By P. M. Dew, University of LeedsAbstract:
This talk will describe the manner in which the Sun Grid Engine (SGE) and the Technical Computing Portal (TCP) are to be used in support of collaborative e-Science projects at the White Rose universities. Our presentation will include an example of the UK e-Science project, DAME (Distributed Aircraft Maintenance Environment), which aims to deliver a grid test-bed for distributed diagnostics. The project builds on the White Rose Computational Grid, based on the Sun Grid Engine, and the Technical Computing Portal that will provide a Web-based interface to technical applications executed on any of the White Rose Computational Grid nodes.
Scheduling in an HPC environment
By Andrea Lorenz, RWTH AachenAbstract:
With the purchase of RWTH's SunFire equipment the old queueing system (GNQS) was replaced by SGE.
The operational part works right out of the box, however, for our job load characteristics with a broad range of job sizes with respect to memory and required CPUs, the standard scheduler gives only a poor utilization.
In the talk the specific requirements and our approaches to meet them are presented.Covering the Spectrum Grid Activities@UCL
By Søren-Aksel Sørensen, University College of London, UCL-CSOverview of Sun Center of Excellence in Geosciences at the University of Houston
By Babu Sunderam, University of HoustonAbstract:
University of Houston (UH) is a Sun Center of Excellence in Geo sciences. Recently, University of Houston obtained a hardware grant and setting up a campus grid to aggregate resources across the campus. Major participants include the HPCTools research group lead by Dr. Barbara Chapman, High performance computing center, Advanced Geosciences lab and Mechanical engineering. At this workshop, we will present an overview of the hardware and software setup of this campus grid under construction. Also, we explain the EZ-Grid system being developed at UH for easier setup, managing and efficient usage of grid environments. The major components of the system include integrated resource broker, usage policy management frameworks and information services. The interfaces with Globus toolkit and Sun Grid Engine are to be presented. We also expect to receive feedback from the workshop participants on improving our EZ-Grid system and tightly integrating with Sun Grid Engine software.
- EPCC: Integrate Globus and Grid Engine with broker on top. Project has started. Prototype to be available by end of 2002. Project to be finalized in 2 years.
- Imperial College: Integrate Grid Engine with Globus via web services and Jini. Brokerage needed for own operation of site – better sooner than later.
- Univ. of Houston: Build broker for Globus and integrate with Grid Engine. Gram enhancements. Prototype already exists.
- PSNC: Grid architecture based on Globus integrating Grid Engine and self developed broker. Special requirement is that Grid service requirements can appear within already executing jobs. Project is in architecture definition phase.
- Raytheon: Has concrete customers which are looking for Grid architectures and solutions today.
- Sun: Looking for the best technology to take to the market and to solve the problems of existing and future customers with Grid requirements.
Some SIG participants already have collected user/customer requirements for grid infrastructures. They will investigate to make them available to the SIG via an appropriate e-mail alias.
- Setup a SIG mailing list (F. Ferstl).
- Each participant to submit requirements, use cases and objectives as available.
- Each participant to submit designs, concept papers, etc. as available.
The current state of the TCP have been discussed and missing features and enhancement requests for a future feature set have been identified.
- iPlanet Portal server is currently required
- installation currently sometimes difficult, at least not straightforward, needs cleanup
- installation under root -> root access should be singled out in a special controlled component, the rest should for security reasons not run as root
- adding of new applications needs a lot of admin interaction and adoptions, so it is currently not easy
- MAC browser cannot work with TCP (netlets)
- NT support not available
- configurability not too easy
- missing documentation, buglist, FAQ
- shared filesystem needed in the current implementation
- Unix account necessary
- file organization of TCP generated files not structured enough
- hook to an existing security framework/environment missing
- prototype is running quite stable and reliable
- setup changes are requiring some amount of admin work, should be easier
- access of password protected web content from a channel was not possible
- Javascript is required
- navigation within forms should be enhanced (too many windows popping up/down)
- customizable views for different users dependent on their authentication/authorization record should be possible
- workflow integration would be nice, macro functionality to replay a set of actions
- netlet technology available, but has restrictions (with VNC e.g. OpenGL not working); should be available to get visualization from a host where the data are or which has the corresponding capabilities
- utility functions should be supported by the portal (e.g. converters, visualizers etc.)
- better output handling should be available (e.g. visualization from any machine through VNC or whatever, output file handling should be more sophisticated to suppress unnecessary file transfers)
- independence from iPlanet
- On what standards is TCP based and who is defining them ?
- ease of installation/administration/application integration
- user profiles should exist and be connected to a security infrastructure, logging and trace facilities should be available
- user space customization and application adding should in certain limits be possible
- better job monitoring facilities should be available
- web service support
- EJB support for persistence
- create TCP (shall be renamed) Open Source Project with mailing list, source code repository (project will be hosted on GE Open Source site)
- around end of May/beginning of June the current version of TCP with some smaller changes will be available (if the legal issues concerning the O'Reilly Servlet Book code have been clarified)
- project guidelines (java programming guidelines, I18N support, etc.) shall be made available
- Legal issues shall be clarified
What concepts does SGE provide for parallel scheduling?
Sun Grid Engine 5.3, Administration and User's Guide (Chapter: Managing Parallel Environments, see page 271: The Allocation rule input..) and sge_pe (5)What is the difference between so called loose and tight integration for parallel jobs?
Sun Grid Engine 5.3, Administration and User's Guide (Chapter: Tight Integration of PEs and Sun Grid Engine Software, page 277)Which requirements are to be fulfilled by parallel jobs to make them run under SGE?
NoneWhat are the issues when designing a general interface for writing customized schedulers?
Scheduler Documentation