Upd:Linux Cluster: Upgrade of the parallel computing capacity

LRZ aktuell publish at lrz.de
Mi Aug 17 17:08:43 CEST 2011


 Changes to this message: MPP cluster user operation delayed
LRZ intends to take two new systems into user operation in its Linux
 Cluster infrastructure within the next weeks; these new systems are
 focused on efficient execution of large scale parallel programs.
 
 The first system is an Infiniband-connected cluster of 16-way shared
 memory systems with more than 2500 cores delivered by MEGWare, and the
 second one consists of SGI Ultraviolet (UV) large-scale shared memory
 systems with more than 2000 cores in total.
 
 It is expected that the Infiniband cluster will become available for
 regular user operation first, followed by the UV system somewhat later.
 However, the existing UV will be retired from user operation soon
 because the space it presently occupies will be needed for the new
 system. Furthermore, the Infiniband cluster will need to be moved into
 the new LRZ infrastructure later in summer, so there will be a
 prolonged interruption of operation some weeks after its initial user
 operation.
 
 The altix2 system will be retired from user operation once the new
 systems are fully available.
 
 Users are advised that all new systems, which are targeted to running
 large parallel jobs, will use SLURM as a batch scheduler. Existing SGE
 scripts will not work on the new clusters - they will require rewriting
 of the control section, and will also need modification of the program
 startup procedure in the script section. Documentation for the new
 batch system will be available via the Linux Cluster page (http://
 www.lrz.de/services/compute/linux-cluster) once user operation starts.
 The usage instructions for startup of MPI programs will also be
 updated.
 
 We hope that this capacity upgrade will reduce the rather long waiting
 times which nowadays are observed for parallel jobs. We intend to keep
 you informed about the status of these new systems via updates to this
 document.
 -----------------------------------------------------------------------
 
 Timeline for operational changes
 
 Friday, August 12
 
 The Ultraviolet system uv1 has an interruption of services for 2 hours,
 starting at 10:00. All running jobs will be removed from the system.
 The reason for this is that the machine must be physically moved to
 make room for the UV upgrade.
 
 Thursday, August 18
 
 The Infiniband cluster is opened for general user operation.
 
 Update: General user operation is delayed from the original target
 date, probably until Thursday, because an unforeseen problem with the
 batch system has surfaced when switching on accounting.
 
 Monday, September 12
 
 The Altix 4700 is retired from user operation.


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/aktuell/ali4065/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell