Upd:SuperMUC: Progress in resolving I/O problems

LRZ aktuell publish at lrz.de
Do Dez 20 08:49:37 CET 2012


 Changes to this message: Progress
Update (Dec 20, 8:50) The two main issues (and a couple of minor
 problems) that caused the I/O problems have been isolated and,
 hopefully, fixed; we intend to repeat the acceptance benchmark as soon
 as possible. Once we consider the fixes fully verified, we intend to
 return to regular user operation - please watch this document for
 further updates.
 -----------------------------------------------------------------------
 Update: Please note that user access to the system will be blocked on
 Friday, Dec 14 at 18:00. LoadLeveler will be stopped in the morning of
 Dec. 15.
 -----------------------------------------------------------------------
 Dear users of the SuperMUC petaflop system,
 
 Strong performance variations and hangs of I/O, which have also been
 observed by regular production programs on the systems, have caused the
 acceptance step for GPFS performed on December 11 to fail.
 
 For this reason, LRZ has decided to turn over the complete system to
 IBM for an open ended analysis phase beginning on Saturday, December
 15, at 12:00
 
 LRZ is of the opinion that there is no alternative to this procedure in
 order to isolate and remove the cause for the observed problems on the
 system and obtain more stable user operation in the long term. We will
 inform you via this document once a date for returning to user
 operation has been fixed.


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4488/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell