Upd:SuperMUC: Back in User Operation
LRZ aktuell
publish at lrz.de
Fr Dez 21 15:10:32 CET 2012
Changes to this message: Update
Update (Dec 21, 15:00) SuperMUC is back in user operation, but not all
issues with GPFS have yet been fixed, however. Please be aware that we
still expect problems and instabilities with the parallel file system.
If you experience any problems please submit a ticket.
-----------------------------------------------------------------------
Update (Dec 20, 8:50) The two main issues (and a couple of minor
problems) that caused the I/O problems have been isolated and,
hopefully, fixed; we intend to repeat the acceptance benchmark as soon
as possible. Once we consider the fixes fully verified, we intend to
return to regular user operation - please watch this document for
further updates.
-----------------------------------------------------------------------
Update: Please note that user access to the system will be blocked on
Friday, Dec 14 at 18:00. LoadLeveler will be stopped in the morning of
Dec. 15.
-----------------------------------------------------------------------
Dear users of the SuperMUC petaflop system,
Strong performance variations and hangs of I/O, which have also been
observed by regular production programs on the systems, have caused the
acceptance step for GPFS performed on December 11 to fail.
For this reason, LRZ has decided to turn over the complete system to
IBM for an open ended analysis phase beginning on Saturday, December
15, at 12:00
LRZ is of the opinion that there is no alternative to this procedure in
order to isolate and remove the cause for the observed problems on the
system and obtain more stable user operation in the long term. We will
inform you via this document once a date for returning to user
operation has been fixed.
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4488/
Reinhold Bader
Mehr Informationen über die Mailingliste aktuell