Notification on Loadleveler queue waiting times

LRZ aktuell publish at lrz.de
Mi Apr 17 13:08:46 CEST 2013


Dear users of the SuperMUC Petaflop system,
 
 several compute nodes are showing hardware problems with the InfiniBand
 cables, which have been out taken user operation. The result of this is
 several islands are having less than 512 available compute nodes at the
 moment. Therefore, jobs requesting 512 compute nodes on a single island
 may result in significantly longer waiting times in the queue.
 
 IBM and Mellanox have started to replace the faulty cables, however,
 due
 to the number of cables this will take some time. LRZ will notify you
 when
 this task has been completed.
 
 Until then, to avoid long queue waiting times and increase the
 throughput
 of your jobs, you may specify a higher value of the max island count,
 e.g.
 given the syntax #@ island_count = min, max you could set the max
 island
 count to the value of the min count plus one .
 
 For example, for a two island job you would specify in your job script
 
 #@ island_count = 2,3
 
 instead of
 
 #@ island_count = 2
 
 or in case of an one island job you would specify
 
 #@ island_count = 1,2
 
 instead of
 
 #@ island_count = 1
 
 
 For further details please see the LRZ Loadleveler documentation at
 
 http://www.lrz.de/services/compute/supermuc/loadleveler/ .
 


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4559/

 Nicolay Hammer



Mehr Informationen über die Mailingliste aktuell