Upd:Preparing for the HLRB-II successor SuperMUC

LRZ aktuell publish at lrz.de
Mi Apr 6 11:11:39 CEST 2011


 Changes to this message: Update
Dear users of HLRB II at LRZ,
 
 After more than 5 years of operation, the presently deployed Altix 4700
 system is approaching the end of its lifetime. It is presently intended
 to retire the Altix 4700 system from user operation toward the end of
 October 2011.
 
 Therefore, we hereby inform you what steps will be taken by LRZ to move
 its supercomputing operations to the next-generation infrastructure,
 which will be delivered by IBM, and will be based on standard x86-64
 processors with an Infiniband-based fat tree interconnect. LRZ intends
 to incrementally move production to the new system throughout summer
 and autumn of 2011.
 
   * Early in August 2011, a migration system containing 205 nodes, each
     containing 40 Westmere-EX cores and 160 GBytes of shared memory,
     will be made available to selected HLRB II users to enable porting
     and tuning of programs. Then, until end of August 2011, the system
     will be made generally available to all users.
   * Migration of the LRZ software stack will start in July 2011, and
     may take until September to be completed. The development
     environment will be based on Intel's compilers (version 12.0), and
     an MPI implementation delivered by IBM. Pure OpenMP programs can
     only be run with up to 40 threads.
   * With a peak performance of 78 TFlop/s, the above-mentioned system
     will be able to take over the job load from the Altix 4700, while
     only needing 25% of the electrical energy compared to HLRB II.
   * IBM's LoadLeveler will be deployed as a batch queuing system;
     therefore you will need to make changes to your existing PBS job
     scripts.
   * During the intermediate operation phase (from August 2011 to summer
     2012), the I/O bandwidth to the scratch disk and project subsystem
     will be lower than presently available on the Altix, and programs
     making use of parallel I/O via MPI-IO or HDF5 will require special
     configuration.
   * In late summer or early autumn 2011, LRZ - in collaboration with
     Intel and IBM - is planning a training workshop on handling the
     system and its development environment, to which all users of the
     HLRB II will be invited.
   * In summer 2012, the much bigger system (SuperMUC) with more than
     6250 nodes (each with 16 Sandy Bridge-EP cores and 32 GBytes of
     memory) will become available for user operation. SuperMUC will
     have a peak performance of 3 PetaFlop/s (3,000 TFlop/s) which is
     roughly 50 times more than the present system. To fully exploit the
     capabilities of the new SIMD (AVX) units on the Sandy Bridge CPUs,
     further tuning measures may be appropriate, but programs compiled
     for the Westmere-based migration system should run without
     problems.
   * The system will also provide GPFS ("General Parallel File System")
     based disk storage with a total capacity of 10,000 TBytes and an
     aggregate bandwidth of 200 GBytes/s, which will fully support
     MPI-IO and HDF5-based parallel I/O. Furthermore, SuperMUC will have
     provisions in place to run jobs in an energy efficient manner e.g.,
     by running processors at a lower frequency if the program's
     performance does not deteriorate.
   * Some months after SuperMUC's start of user operation, the nodes of
     the migration system will be integrated with the main system;
     forming a "fat node" island for programs which require a large
     shared memory.
 
 We expect that, due to security considerations, some policies on system
 usage (especially how to access the system) may change. We will keep
 you informed about these changes through the documentation of the new
 system once details have been determined.
 
 We wish all of you the best possible success for your scientific
 simulation endeavor on our new Petaflop class successor to HLRB II. If
 you have further questions, please forward them to us via the LRZ
 Servicedesk.
 
 Some additional information is also available via http://www.lrz.de/
 services/compute/supermuc/01_supermuc_desc/.


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali3984/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell