Notification on Loadleveler queue waiting times

LRZ aktuell publish at
Mi Apr 17 13:08:46 CEST 2013

Dear users of the SuperMUC Petaflop system,
 several compute nodes are showing hardware problems with the InfiniBand
 cables, which have been out taken user operation. The result of this is
 several islands are having less than 512 available compute nodes at the
 moment. Therefore, jobs requesting 512 compute nodes on a single island
 may result in significantly longer waiting times in the queue.
 IBM and Mellanox have started to replace the faulty cables, however,
 to the number of cables this will take some time. LRZ will notify you
 this task has been completed.
 Until then, to avoid long queue waiting times and increase the
 of your jobs, you may specify a higher value of the max island count,
 given the syntax #@ island_count = min, max you could set the max
 count to the value of the min count plus one .
 For example, for a two island job you would specify in your job script
 #@ island_count = 2,3
 instead of
 #@ island_count = 2
 or in case of an one island job you would specify
 #@ island_count = 1,2
 instead of
 #@ island_count = 1
 For further details please see the LRZ Loadleveler documentation at .

 This information is also available on our web server

 Nicolay Hammer

Mehr Informationen über die Mailingliste aktuell