Notification on Loadleveler queue waiting times
LRZ aktuell
publish at lrz.de
Mi Apr 17 13:08:46 CEST 2013
Dear users of the SuperMUC Petaflop system,
several compute nodes are showing hardware problems with the InfiniBand
cables, which have been out taken user operation. The result of this is
several islands are having less than 512 available compute nodes at the
moment. Therefore, jobs requesting 512 compute nodes on a single island
may result in significantly longer waiting times in the queue.
IBM and Mellanox have started to replace the faulty cables, however,
due
to the number of cables this will take some time. LRZ will notify you
when
this task has been completed.
Until then, to avoid long queue waiting times and increase the
throughput
of your jobs, you may specify a higher value of the max island count,
e.g.
given the syntax #@ island_count = min, max you could set the max
island
count to the value of the min count plus one .
For example, for a two island job you would specify in your job script
#@ island_count = 2,3
instead of
#@ island_count = 2
or in case of an one island job you would specify
#@ island_count = 1,2
instead of
#@ island_count = 1
For further details please see the LRZ Loadleveler documentation at
http://www.lrz.de/services/compute/supermuc/loadleveler/ .
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/hlrb/aktuell/ali4559/
Nicolay Hammer
Mehr Informationen über die Mailingliste aktuell