Upd:SuperMUC: scheduled maintenance Jan 28 31
LRZ aktuell
publish at lrz.de
Mo Feb 4 21:43:28 CET 2013
Changes to this message: Returned to user operation
Update (Feb 4, 2013, 21:40) Although the afternoon has become rather
late, we are now returning the system to user operation and holding
thumbs that the updates will keep the promised stability improvements.
Update (Feb 4, 2013, 14:40)
Dear users of the SuperMUC system,
IBM and LRZ have worked hard over the weekend to resolve all system
stability issues caused by some of the maintenance activities last
week. So far, we believe that almost all problems reported by users of
the system in the last weeks should be resolved now.
We ve also been able to clearly pinpoint one previously unknown problem
with processing LoadLeveler job STDIN and STDOUT files in GPFS during
our intensive test and debugging activities over the weekend. Therefore
we recommend to forego using GPFS for these IO streams, and instead
write them to your HOME directory. Please also place LoadLeveler output
and error files inside your HOME directory.
We expect the system to go online again this afternoon. We regret any
inconvenience the extended maintenance activities may have caused, and
hope for your understanding.
-----------------------------------------------------------------------
Dear users of the SuperMUC petascale system at LRZ,
Due to a combined hard- and software maintenance, SuperMUC will be
unavailable for user operation between January 28, 8:00 and the
afternoon of January 31. Jobs still running at the beginning of the
maintenance will be cancelled.
The following changes are introduced by this maintenance:
* Access to the PRACE network will become available to PRACE users,
* LoadLeveler mail notification will be activated,
* Licence servers will be accessible from the compute nodes,
* The Infiniband software stack will be updated to a newer release,
* Some presently missing packages will become available on the
compute nodes (e.g., you can remove the workaround for a missing
libnuma.so)
* Intel MPI: The newest 4.1 bug fix release is now used by default.
* Intel MKL: The 11.0 release is now used by default. The old mkl/
10.3 module is however still available if you see trouble with the
new version.
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4502/
Reinhold Bader
Mehr Informationen über die Mailingliste aktuell