Upd:Update: Unscheduled maintenance on SuperMUC
LRZ aktuell
publish at lrz.de
Mo Okt 1 17:13:56 CEST 2012
Changes to this message: Update
3rd Update (Oct 1, 17:15)
The cause of the GPFS failure on Saturday, Sep 29 has been identified
as a software problem triggered by data migration to additionally
installed disk space. Unfortunately, further investigation is still
needed before a return to user operation is possible. We will provide
an update via this URL once we know more.
-----------------------------------------------------------------------
Dear users of SuperMUC,
the reason for the problems with the GPFS filesystem on SuperMUC seem
to be defective Infiniband hardware component(s), which broke as a
consequence of the power failure on Sunday, September 23rd.
Unfortunately, the particular hardware components could not yet be
identified and we cannot reliably estimate how long the search will
take.
IBM is still working on the problem.
We apologize for the inconveniences caused by this failure. We will
inform you through an update to this announcement as soon as the
problem is fixed.
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4424/
Markus Mueller
Mehr Informationen über die Mailingliste aktuell