Notification on Loadleveler queue waiting times
Dear users of the SuperMUC Petaflop system, several compute nodes are showing hardware problems with the InfiniBand cables, which have been out taken user operation. The result of this is several islands are having less than 512 available compute nodes at the moment. Therefore, jobs requesting 512 compute nodes on a single island may result in significantly longer waiting times in the queue. IBM and Mellanox have started to replace the faulty cables, however, due to the number of cables this will take some time. LRZ will notify you when this task has been completed. Until then, to avoid long queue waiting times and increase the throughput of your jobs, you may specify a higher value of the max island count, e.g. given the syntax #@ island_count = min, max you could set the max island count to the value of the min count plus one . For example, for a two island job you would specify in your job script #@ island_count = 2,3 instead of #@ island_count = 2 or in case of an one island job you would specify #@ island_count = 1,2 instead of #@ island_count = 1 For further details please see the LRZ Loadleveler documentation at http://www.lrz.de/services/compute/supermuc/loadleveler/ . This information is also available on our web server http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4559/ Nicolay Hammer
participants (1)
-
LRZ aktuell