SuperMUC: Changes in processing of special jobs
LRZ aktuell
publish at lrz.de
Do Dez 19 15:00:28 CET 2013
Dear users of the SuperMUC Petaflop system at LRZ,
based on user request as well as internal constraints, we intend to
change the way how very large jobs (i.e. those submitted into the
special job class) are processed. The new procedure for doing this is
defined as follows:
1. Any user wishing to run a job in the special class must open a
service request at the LRZ service desk ( https://
servicedesk.lrz.de/?service=
Hochleistungsrechnen%20und%20Grid-Supercomputing%20%28SuperMUC%29&
lang=en), stating (a) the job run time, (b) the number of requested
nodes for a job, (c) the number of jobs with these resources. Some
justification / rationale for running jobs of this size should be
provided; also providing contact data (phone numbers, skype
address) is considered helpful.
2. After the user s account has been validated for the special
class, the user can then submit jobs into that class. All such jobs
should be put into the user hold state via llhold . If there are
dependencies between jobs, these must be accounted for by the
submitting user (e.g. via the @dependency keyword). The
submissions should be consistent with the initial specification.
3. Once a sufficient number of jobs is in the queue and the
operational conditions are considered suitable, LRZ will schedule a
block operation phase; this will typically last a limited amount of
time (usually at most 2 days). No guarantees can be given on which
of the jobs queued in special will be processed. The block
operation phase is initiated by flushing the machine; this means
that all running jobs will be terminated. If rerunnable, these jobs
will be rerun as soon as regular operation is resumed, and no
accounting is performed for terminated jobs.
4. After completion of the processing, users whose jobs ran
successfully are obliged to provide feedback on the scientific and
performance results of their runs (e.g. a draft paper) via the
original service request. For unsuccessful runs, feedback on
observed behaviours and error signatures must be provided.
The next block operation has been tentatively scheduled for Monday, Jan
27, 2014, 9:00 and will terminate on Wednesday, Jan 29, 12:00 at the
latest. Please submit any service requests for this block operation as
soon as possible; before specifying any resources please also check
with your project manager whether your processing budget is sufficient
to cover the expense of CPU time required by your jobs.
This information is also available on our web server
http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4738/
Reinhold Bader
Mehr Informationen über die Mailingliste aktuell