Upd:Linux Cluster: Returning to user operation
LRZ aktuell
publish at lrz.de
Di Mär 1 15:47:40 CET 2011
Changes to this message: Slowly returning to user operation
Update:
As of March 1, 15:45, a large part of the cluster systems has been
returned to regular user operation. There are some exceptions:
* due to hardware problems, the Altix 4700 will remain unavailable
for a few days still
* some other (mostly serial) nodes have also suffered hardware
failures due to the power cutoff. These will be returned to user
operation as repairs are forthcoming.
-----------------------------------------------------------------------
As indicated in ALI 3942, an interruption of electrical power is
scheduled for the weekend Feb 26-27. On the Linux cluster, additional
maintenance measures will also be required. Therefore, the cluster will
be unavailable for user operation between
Friday, February 25, 16:00 and Tuesday, March 1, 18:00
All running batch jobs will be removed from the systems; we will leave
the queues open until Friday 15:00, but recommend to set a user hold on
any queued jobs which do not start execution in time to complete by
Friday afternoon.
Note: A job can be put into user hold by issuing the command
qalter -h u <job id>
Once the cluster has been returned to operation, you need to explicitly
remove the hold again with
qrls -h u <job id>
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali3950/
Reinhold Bader
Mehr Informationen über die Mailingliste aktuell