Upd:News on the HLRB-II successor SuperMUC

LRZ aktuell publish at lrz.de
Fr Jul 29 09:36:15 CEST 2011


 Changes to this message: Corrections to Altix retirement target
Dear users of HLRB II at LRZ,
 
 The initial installation of the next-generation IBM system with 205
 nodes, each containing 40 Westmere-EX cores and 256 GBytes of shared
 memory, will soon become ready for user operation. This enables us to
 provide more details about the time table for migration of users from
 the Altix 4700 to the new system:
 
   * We intend to start by selecting large-scale users (job sizes 1000
     cores and more) early in August, and incrementally give access also
     to smaller-scale users throughout August, until all users will be
     able to access the system in early September
   * The target is that all HLRB-II users should be able to run their
     codes on the new system by end of September; in case of
     difficulties please contact the LRZ Service Desk.
   * IBM's LoadLeveler will be deployed as a batch queuing system;
     therefore you will need to make changes to your existing PBS job
     scripts.
   * During the intermediate operation phase (from August 2011 to summer
     2012), the I/O bandwidth to the scratch disk and project subsystem
     will be lower than presently available on the Altix, and programs
     making use of parallel I/O via MPI-IO or HDF5 will require special
     configuration.
   * Migration of the LRZ software stack will start in July 2011, and
     may take until September to be completed. The development
     environment will be based on Intel's compilers (version 12.0), and
     either IBM MPI or Intel MPI can be used. Pure OpenMP programs can
     only be run efficiently with up to 40 threads.
   * The Altix 4700 will be retired from user operation in early October
     2011.
   * Since there exists no access to scratch and project data produced
     ($OPT_TMP or $PROJECT) on HLRB-II from the new system, LRZ urgently
     asks you to delete unneeded data from the scratch and project areas
     and to write all important data to tape on HLRB-II within the next
     few weeks. A procedure for restoring TSM tape archives written on
     HLRB-II to the new system will be provided.
 
 Documentation on the new system will be accessible via http://
 www.lrz.de/services/compute/supermuc/. Note that as of July 27 2011,
 this is still work in progress.
 
 Further down the road, the following steps will be taken:
 
   * In early December 2011, LRZ - in collaboration with Intel and IBM -
     is planning a training workshop on handling the system and its
     development environment, to which all users of the HLRB II will be
     invited.
   * In summer 2012, the much bigger system (SuperMUC) with more than
     6250 nodes (each with 16 Sandy Bridge-EP cores and 32 GBytes of
     memory) will become available for user operation. SuperMUC will
     have a peak performance of 3 PetaFlop/s (3,000 TFlop/s) which is
     roughly 50 times more than the present system. To fully exploit the
     capabilities of the new SIMD (AVX) units on the Sandy Bridge CPUs,
     further tuning measures may be appropriate, but programs compiled
     for the Westmere-based migration system should run without
     problems.
   * The system will also provide GPFS ("General Parallel File System")
     based disk storage with a total capacity of 10,000 TBytes and an
     aggregate bandwidth of 200 GBytes/s, which will fully support
     MPI-IO and HDF5-based parallel I/O. Furthermore, SuperMUC will have
     provisions in place to run jobs in an energy efficient manner e.g.,
     by running processors at a lower frequency if the program's
     performance does not deteriorate.
   * Some months after SuperMUC's start of user operation, the nodes of
     the migration system will be integrated with the main system,
     forming a "fat node" island for programs which require a large
     shared memory.
 
 We expect that, due to security considerations, some policies on system
 usage (especially how to access the system) may change. We will keep
 you informed about these changes through the documentation of the new
 system once details have been determined.
 
 We wish all of you the best possible success for your scientific
 simulation endeavor on our new Petaflop class successor to HLRB II. If
 you have further questions, please forward them to us via the LRZ
 Servicedesk.


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4069/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell