Upd:Linux Cluster: Returning to user operation

LRZ aktuell publish at lrz.de
Di Mär 1 15:47:40 CET 2011


 Changes to this message: Slowly returning to user operation
Update:
 
 As of March 1, 15:45, a large part of the cluster systems has been
 returned to regular user operation. There are some exceptions:
 
   * due to hardware problems, the Altix 4700 will remain unavailable
     for a few days still
   * some other (mostly serial) nodes have also suffered hardware
     failures due to the power cutoff. These will be returned to user
     operation as repairs are forthcoming.
 
 -----------------------------------------------------------------------
 As indicated in ALI 3942, an interruption of electrical power is
 scheduled for the weekend Feb 26-27. On the Linux cluster, additional
 maintenance measures will also be required. Therefore, the cluster will
 be unavailable for user operation between
 
         Friday, February 25, 16:00 and Tuesday, March 1, 18:00         
 
 All running batch jobs will be removed from the systems; we will leave
 the queues open until Friday 15:00, but recommend to set a user hold on
 any queued jobs which do not start execution in time to complete by
 Friday afternoon.
 
 Note: A job can be put into user hold by issuing the command
 qalter -h u <job id>
 Once the cluster has been returned to operation, you need to explicitly
 remove the hold again with
 qrls -h u <job id>


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali3950/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell