SuperMUC: Changes in processing of special jobs

LRZ aktuell publish at lrz.de
Do Dez 19 15:00:28 CET 2013


Dear users of the SuperMUC Petaflop system at LRZ,
 
 based on user request as well as internal constraints, we intend to
 change the way how very large jobs (i.e. those submitted into the  
 special  job class) are processed. The new procedure for doing this is
 defined as follows:
 
  1. Any user wishing to run a job in the special class must open a
     service request at the LRZ service desk ( https://
     servicedesk.lrz.de/?service=
     Hochleistungsrechnen%20und%20Grid-Supercomputing%20%28SuperMUC%29&
     lang=en), stating (a) the job run time, (b) the number of requested
     nodes for a job, (c) the number of jobs with these resources. Some
     justification / rationale for running jobs of this size should be
     provided; also providing contact data (phone numbers, skype
     address) is considered helpful.
  2. After the user s account has been validated for the  special 
     class, the user can then submit jobs into that class. All such jobs
     should be put into the user hold state via  llhold . If there are
     dependencies between jobs, these must be accounted for by the
     submitting user (e.g. via the  @dependency  keyword). The
     submissions should be consistent with the initial specification.
  3. Once a sufficient number of jobs is in the queue and the
     operational conditions are considered suitable, LRZ will schedule a
     block operation phase; this will typically last a limited amount of
     time (usually at most 2 days). No guarantees can be given on which
     of the jobs queued in  special  will be processed. The block
     operation phase is initiated by flushing the machine; this means
     that all running jobs will be terminated. If rerunnable, these jobs
     will be rerun as soon as regular operation is resumed, and no
     accounting is performed for terminated jobs.
  4. After completion of the processing, users whose jobs ran
     successfully are obliged to provide feedback on the scientific and
     performance results of their runs (e.g. a draft paper) via the
     original service request. For unsuccessful runs, feedback on
     observed behaviours and error signatures must be provided.
 
 The next block operation has been tentatively scheduled for Monday, Jan
 27, 2014, 9:00 and will terminate on Wednesday, Jan 29, 12:00 at the
 latest. Please submit any service requests for this block operation as
 soon as possible; before specifying any resources please also check
 with your project manager whether your processing budget is sufficient
 to cover the expense of CPU time required by your jobs.


 This information is also available on our web server
 http://www.lrz-muenchen.de/services/compute/supermuc/aktuell/ali4738/

 Reinhold Bader



Mehr Informationen über die Mailingliste aktuell