Upd:News on the HLRB-II successor SuperMUC
LRZ aktuell
publish at lrz.de
Fr Jul 29 09:36:15 CEST 2011
Changes to this message: Corrections to Altix retirement target
Dear users of HLRB II at LRZ,
The initial installation of the next-generation IBM system with 205
nodes, each containing 40 Westmere-EX cores and 256 GBytes of shared
memory, will soon become ready for user operation. This enables us to
provide more details about the time table for migration of users from
the Altix 4700 to the new system:
* We intend to start by selecting large-scale users (job sizes 1000
cores and more) early in August, and incrementally give access also
to smaller-scale users throughout August, until all users will be
able to access the system in early September
* The target is that all HLRB-II users should be able to run their
codes on the new system by end of September; in case of
difficulties please contact the LRZ Service Desk.
* IBM's LoadLeveler will be deployed as a batch queuing system;
therefore you will need to make changes to your existing PBS job
scripts.
* During the intermediate operation phase (from August 2011 to summer
2012), the I/O bandwidth to the scratch disk and project subsystem
will be lower than presently available on the Altix, and programs
making use of parallel I/O via MPI-IO or HDF5 will require special
configuration.
* Migration of the LRZ software stack will start in July 2011, and
may take until September to be completed. The development
environment will be based on Intel's compilers (version 12.0), and
either IBM MPI or Intel MPI can be used. Pure OpenMP programs can
only be run efficiently with up to 40 threads.
* The Altix 4700 will be retired from user operation in early October
2011.
* Since there exists no access to scratch and project data produced
($OPT_TMP or $PROJECT) on HLRB-II from the new system, LRZ urgently
asks you to delete unneeded data from the scratch and project areas
and to write all important data to tape on HLRB-II within the next
few weeks. A procedure for restoring TSM tape archives written on
HLRB-II to the new system will be provided.
Documentation on the new system will be accessible via http://
www.lrz.de/services/compute/supermuc/. Note that as of July 27 2011,
this is still work in progress.
Further down the road, the following steps will be taken:
* In early December 2011, LRZ - in collaboration with Intel and IBM -
is planning a training workshop on handling the system and its
development environment, to which all users of the HLRB II will be
invited.
* In summer 2012, the much bigger system (SuperMUC) with more than
6250 nodes (each with 16 Sandy Bridge-EP cores and 32 GBytes of
memory) will become available for user operation. SuperMUC will
have a peak performance of 3 PetaFlop/s (3,000 TFlop/s) which is
roughly 50 times more than the present system. To fully exploit the
capabilities of the new SIMD (AVX) units on the Sandy Bridge CPUs,
further tuning measures may be appropriate, but programs compiled
for the Westmere-based migration system should run without
problems.
* The system will also provide GPFS ("General Parallel File System")
based disk storage with a total capacity of 10,000 TBytes and an
aggregate bandwidth of 200 GBytes/s, which will fully support
MPI-IO and HDF5-based parallel I/O. Furthermore, SuperMUC will have
provisions in place to run jobs in an energy efficient manner e.g.,
by running processors at a lower frequency if the program's
performance does not deteriorate.
* Some months after SuperMUC's start of user operation, the nodes of
the migration system will be integrated with the main system,
forming a "fat node" island for programs which require a large
shared memory.
We expect that, due to security considerations, some policies on system
usage (especially how to access the system) may change. We will keep
you informed about these changes through the documentation of the new
system once details have been determined.
We wish all of you the best possible success for your scientific
simulation endeavor on our new Petaflop class successor to HLRB II. If
you have further questions, please forward them to us via the LRZ
Servicedesk.
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4069/
Reinhold Bader
Mehr Informationen über die Mailingliste aktuell