Changes between Version 82 and Version 83 of Tutorial/JobSubm


Ignore:
Timestamp:
Sep 26, 2010, 6:08:34 PM (15 years ago)
Author:
/C=FR/O=CNRS/OU=UMR8607/CN=Charles Loomis/emailAddress=loomis@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Tutorial/JobSubm

    v82 v83  
    137137=== Controlling Retries done by WMS ===
    138138
    139 After the job has been submitted to WMS, WMS goes through several phases until final submission of the job to the CE and its execution. For various reasons, errors can occured at each stage and a user can control the number of retries WMS must do through 2 JDL attributes:
     139After the job has been submitted to WMS, WMS goes through several phases until final submission of the job to the CE and its execution. For various reasons, errors can occured at each stage and a user can control the number of retries WMS must do through two JDL attributes:
    140140 * `ShallowRetryCount`: defines the maximum number of times the WMS try to submit a job to the selected CE in case of an error occured at submission time, before the job has actually started. This is called a ''shallow resubmission'' : at each attempt, a different CE is selected. Default is site specific and there is a maximum defined on the WMS itself.
    141  * `RetryCount`: defines the maximum number of times the WMS try to resubmit a job in case of an error occured after the job started to run on the CE. This is called a ''deep resubmission'' : specific actions may be required to cleanup files left by the previous run attempt... Default is site specific and there is a maximum defined on the WMS itself.
     141 * `RetryCount`: defines the maximum number of times the WMS try to resubmit a job in case of an error occured after the job started to run on the CE. This is called a ''deep resubmission'' : specific actions may be required to cleanup files left by the previous run attempt. The default is site specific and there is a maximum defined on the WMS itself.
    142142
    143143In addition to resubmission, the WMS can retry the first phase of the job processing, called ''match making'', responsible for selecting a CE. In the event of an error during this phase, the ''match making'' is retried at an interval defined by the site during a maximum period also defined by the site. The user has no control on this. During this period the job status is `Waiting` and if the ''match making'' fails after the maximum period allowed the request job fails with a status reason which is either `No compatible resource` (match making process failed to find a resource matching job requirements) or `Request expired` if the match making failed because the maximum period allowed was reached before `ShallowRetryCount`.
     
    148148
    149149JDL files accept many [http://server11.infn.it/workload-grid/docs/DataGrid-01-NOT-0101-0_6-Note.pdf different attributes]. In addition to those already described, a few other useful attributes are:
    150  * `ShortDeadlineJob`: if `true`, WMS adds requirements to ensure the job is queued on a CE accepting such jobs (a Short Deadline Job is a job that must be run immediatly if accepted by the CE but has generally very short limit in CPU time).
     150 * `ShortDeadlineJob`: if `true`, WMS adds requirements to ensure the job is queued on a CE accepting such jobs (a Short Deadline Job is a job that must be run immediately if accepted by the CE but has generally very short limit in CPU time).
    151151
    152152== L'environnent d'exécution sur le Worker Node ==