Changes between Initial Version and Version 1 of Doc/gLite/TemplateCustomization/Services


Ignore:
Timestamp:
Jan 16, 2011, 11:48:12 PM (13 years ago)
Author:
/O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Michel Jouvin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Doc/gLite/TemplateCustomization/Services

    v1 v1  
     1= Configuration of gLite Services =
     2[[TracNav]]
     3
     4[[TOC(inline)]]
     5
     6
     7This section contains information about how to configure individual gLite services. Refer to the [/wiki/Doc/gLite/TemplateCustomization/General dedicated page] for the description of service-independent gLite configuration parameters.
     8 
     9
     10== CE Configuration ==
     11
     12__Base template__ : `machine-types/ce`.
     13
     14QWG templates can handle configuration of the LCG (gLite 3.1 only) or the CREAM CE and its associated batch system (LRMS). Most of the configuration description is common to both type of CE. In gLite 3.1, CE type defaults to LCG for backward compatibility whereas in gLite 3.2 it defaults to CREAM, the only CE availabe. CE type selection is done with variable `CE_TYPE` which must be `lcg` or `cream`. This variable is ignore in gLite 3.2.
     15
     16LRMS selection is done with variable `CE_BATCH_NAME`. '''There is no default'''. The supported LRMS and associated values are:
     17 * Torque/MAUI: `torque2`
     18 * Condor: `condor`
     19
     20''Note: the value of `CE_BATCH_NAME` must match a directory in `common` directory of gLite templates.''
     21
     22''Note: previous versions of QWG templates used to require definition of `CE_BATCH_SYS`. This is deprecated : this variable is now computed from `CE_BATCH_NAME`.''
     23
     24Site-specific gLite parameters must declare the host name of the CEs that share the same worker nodes. All the CEs declared in one set of gLite parameters (one gLite parameter template) will share the same WNs. To configure several CEs with distinct worker nodes, you must create separate clusters. Host name of the CEs can be declared with one of the following two variables:
     25 * `CE_HOSTS`: a list of host names corresponding to the different CEs sharing the same WNs.
     26 * `CE_HOST`: for backward compatibility, when there is only one CE, this variable can be defined to its name, instead of using `CE_HOSTS`.
     27
     28In addition, 2 other variables independent of the LRMS are available:
     29 * `CE_PRIV_HOST`: alternate name of CE host. Used in configuration where WNs are in a private network and CE has 2 network names/adresses. This variable is not (yet) supported with multiple CEs.
     30 * `CE_WN_ARCH`: OS architecture on CE worker nodes. Due to limitation in the way this information is published now, this is a CE-wide value. If you have both 64-bit and 32-bit WNs, you must publish 32-bit (`i386`). Default value is based on CE architecture.
     31
     32=== Sharing WNs between several CEs === #CESharingWNs
     33
     34QWG templates allow to configure several CEs sharing the same WNs. They must share the same gLite parameters and the variable `CE_HOSTS` must contain all the CE host names. They can be LCG CE and/or CREAM CE. If you want to mix LCG and CREAM CE, it is recommended to maintain a separate list of for each CE type and build `CE_HOSTS` by merging them as in the following example:
     35{{{
     36variable CE_HOSTS_CREAM ?= list('cream1.example.org','cream2.example.org');
     37variable CE_HOSTS_LCG ?= list('lcg1.example.org','lcg2.example.org');
     38variable CE_HOSTS ?= merge(CE_HOSTS_LCG,CE_HOSTS_CREAM);
     39}}}
     40
     41In addition, when using several CEs with the same WNs, it is necessary to configure a [#SharedGridmapdir shared gridmapdir]. This is '''required''' to ensure consistency of DN/userid mapping across CEs.
     42
     43=== CREAM CE Specific Configuration === #CREAMConfig
     44
     45CREAM CE has some unique features and requirements, not available in LCG CE, that can be easily customized with QWG templates. To identify CREAM CEs among all defined CEs, they must belong to the list `CE_HOSTS_CREAM`, as suggested [#CESharingWNs above].
     46
     47CREAM CE uses internally a MySQL database. The database connexion can be configured with the following variables:
     48 * `CREAM_MYSQL_ADMINUSER` (optional): MySQL user with administrative privileges. Default: `root`.
     49 * `CREAM_MYSQL_ADMINPWD` (required): password of MySQL administrative account. No default.
     50 * `CREAM_DB_USER` (optional): MySQL user used by CREAM CE components. Default: `creamdba`.
     51 * `CREAM_DB_PASSWORD` (required): password of MySQL user used by CREAM CE components. No default.
     52 * `CREAM_MYSQL_SERVER` (optional): host name running the MySQL server used by the CE. Default: CE host name.
     53
     54In particular, CREAM CE has a WMS-like management of user input and output sandbox : they are all stored in a dedicated area, outside user home directory. In a configuration where home directories are shared through NFS (or another distributed file system), this requires an additional to share this sandbox area too. It is also possible to share the sandbox area between the CE and the WNs, even though the home directories are not. Variables related to sandbox management are:
     55 * `CREAM_SANDBOX_MPOINTS`: a nlist defining the CE whose sandbox area must be shared. Only the CE with an entry in this nlist will have their sandbox area shared with WN. The key is the CE host name and the value is the mount point to use on the WN. There is no need for the mount point on the WN to be the same as on the CE. There is no default for this variable.
     56 * `CREAM_SANDBOX_DIRS`: a nlist defining where the sandbox area is located on each CE. There may be one entry per CE and one default entry (key=`DEFAULT`). If no entry apply to a CE, the standard default, `/var/cream_sandbox`, is used.
     57 * `CREAM_SANDBOX_SHARED_FS`: a nlist defining the protocol to use for sharing sandbox area. There may be one entry per CE and one default entry (key=`DEFAULT`). If undefined, `nfs` is assumed. If defined but no entry apply to the current CE (and there is no default entry), assume something other than NFS.
     58 
     59When NFS is used to share sandbox area, the [Shared File Systems usual NFS variables] apply to define NFS version to use, mount options...
     60
     61''Note: sandbox area sharing is configured independently of other file systems specified in `WN_SHARED_AREAS`. Sandbox areas are normally not specified in `WN_SHARED_AREAS` but if they are, this takes predence over the specific configuration done with `CREAM_SANDBOX_MPOINTS`.''
     62
     63When using sandbox sharing with several CEs (specified in the same `CE_HOSTS` variable), it is important to define a distinct mount point for each CE.  Below is an example showing how to define `CREAM_SANDBOX_MPOINTS` based on `CE_HOSTS_CREAM`:
     64{{{
     65variable CREAM_SANDBOX_MPOINTS ?= {
     66  foreach (i;ce;CE_HOSTS_CREAM) {
     67    SELF[ce] = '/cream_sandbox/'+ce;
     68  };
     69  SELF;
     70};
     71}}}
     72
     73A few other variables specific to CREAM CE are available, in particular to define log locations:
     74 * `CREAM_LOG_DIR`: location of CREAM CE log. Default: `/var/log/glite`.
     75 * `BLPARSER_LOG_DIR`: location of BLParser log file. Default: `/var/log/glite`.
     76 * `GLEXEC_LOG_DESTINATION`: must be `syslog` or `file`. Default is `syslog` for CREAM 1.5 and `file` for later versions.
     77 * `GLEXEC_LOG_DIR`; location of glexec log files. '''This must be different from the 2 other log locations''' because the permissions are not compatible (none belong to `root`). It is ignored if `GLEXEC_LOG_DESTINATION` is set to `syslog`. Default: `/var/log/glexec`.
     78 * `CEMON_ENABLED`: if `true`, CEMonitor is configured and started. Default: `false` in CREAM 1.6 and later. ''Note: CEMonitor is not used by any standard gLite components/services.''
     79 
     80CREAM CE relies on ''BLParser'' to interact with the batch system and get status back about submitted jobs. The BLParser must run on a machine with access to the batch system logs. The default is to run it on the LRMS master, which can be defined explicitly with variable `LRMS_SERVER_HOST` and defaults to the first CE in `CE_HOSTS`. For specific needs, it is possible to define explicitly the BLParser host with variable `BLPARSER_HOST`.
     81
     82CREAM CE implements a job purger to clean database entries and sandboxes related to completed jobs (aborted, canceled or done). Default configuration should be appropriate but for specific needs, the following variables can be used to customize the job purger policy:
     83 * `CREAM_JOB_PURGE_RATE`:  interval between 2 runs of the purger in minutes. Default: `720`.
     84 * `CREAM_JOB_PURGE_POLICY_ABORTED`: for jobs in `ABORTED` state, job age in days before purging it. Default: `10`.
     85 * `CREAM_JOB_PURGE_POLICY_CANCELED`: for jobs in `CANCELED` state, job age in days before purging it. Default: `10`.
     86 * `CREAM_JOB_PURGE_POLICY_DONEOK`: for jobs in `DONE-OK` state, job age in days before purging it. Default: `15`.
     87 * `CREAM_JOB_PURGE_POLICY_DONEFAILED`: for jobs in `DONE-FAILED` state, job age in days before purging it. Default: `10`.
     88 * `CREAM_JOB_PURGE_POLICY_REGISTERED`: for jobs in `REGISTERED` state, job age in days before purging it. Default: `2`.
     89
     90For more information on CREAM CE configuration and troubleshooting, refer to the CREAM CE [http://grid.pd.infn.it/cream/ official web site].
     91
     92Most of these variables are usually defined in [source:templates/trunk/sites/example/site/glite/config.tpl gLite parameters]. Look at the [/changeset?new=5066%40templates%2Ftrunk%2Fsites%2Fexample%2Fsite%2Fglite%2Fconfig.tpl&old=4604%40templates%2Ftrunk%2Fsites%2Fexample%2Fsite%2Fglite%2Fconfig.tpl example changes] to illustrate modifications typically required to an existing gLite parameter template to support CREAM CEs.
     93
     94If experiencing difficulties during the initial installation, be sure to read [/wiki/ReleaseNotes/gLite-3.2#KI3203CREAMCE release notes].
     95
     96=== Home Directories of VO Accounts ===
     97
     98QWG templates support both shared and non shared home directories for VO accounts. See section on NFS Server for more information on how to configure shared home directories. Shared home directories are the recommended configuration and are required to support MPI.
     99
     100Independently of the shared/non shared configuration, the following variable is used to configure home directories for VO accounts:
     101 * `VO_HOMES`: a nlist defining parent of home directories for all the VO accounts. For each entry, the key is the VO name as defined in variable `VOS` (it may be a VO alias name) and the value is the parent directory for the corresponding accounts (pool accounts and other accounts associated with roles). A special entry, `DEFAULT` may be used to define home directory parent for all the VOs without an explicit entry.
     102
     103When supporting multiple VOs, the number of accounts can be very large (several thousands). This may lead to performance problems if they all share a common parent. In the value defining the parent directory, it is possible to use the following keywords to create a per-VO parent under a common root (in a common file system):
     104 * `@VONAME@` : will be expanded to the VO full name
     105 * `@VOALIAS@` : will be expanded to the VO alias name locally defined. When possible, it is better to use the VO full name which is unique and will not change.
     106
     107For example, the following variable create one directory per VO under `/home` and accounts for each VO will be created in the VO-specific directory:
     108{{{
     109variable VO_HOMES ?= nlist(
     110  'DEFAULT',       '/home/@VONAME@',
     111);
     112}}}
     113
     114When modifying an existing configuration, a careful planning is needed. This cannot be done on the fly. To avoid a long reconfiguration of `ncm-accounts`, this generally involves:
     115 * On the NFS server, move existing home directories to the appropriate location
     116 * Delete accounts using `/home` (except `ident`) from `/etc/password`. This can be done with a script deployed and executed with `ncm-filecopy`
     117 * Update your site parameters and deploy the changes, defining `ncm-useraccess` as a post-dependency for `ncm-accounts` if it is used in the configuration. This will ensure that during deployment all accounts are recreated and the ssh, Kerberos... configuration for the user is done.
     118
     119
     120=== Defining Queues ===
     121
     122Definition of queues is done independently of the LRMS used. The following variables are used to define queues:
     123 * `CE_QUEUES_SITE` : a nlist defining for each queue the list of VOs allowed to access the queue and optionally the specific attributes of the queue. Access list for queue is defined under `vos` key, attributes under `attlist` key. The value for each key is a nlist where the key is the queue name. For access list, the value is a list of VO allowed or denied access to the queue (to deny access, prefix VO name with a `-`). For queue attributes, the value is a nlist where the key is a Torque attribute and the value the attribute value. By default one queue is created for each VO. Look at [source:templates/trunk/sites/example/site/site/glite/config.tpl example] for more information on how to customize default configuration. To undefine a standard queue, define its `attlist` to `undef`.
     124 * `CE_LOCAL_QUEUES` : a list of Torque queue to define that will not be available for grid usage (accessible only with standard Torque commands). This list has a format very similar to `CE_QUEUES`, except that key containing queue name is called `names` instead of `vos` and that its value is useless.
     125
     126''Note: in previous version of the templates, customization of queue list was done by defining `CE_QUEUES` variable in site parameters. In this case the creation of the queue for each VO had to be done in site templates. This has been changed and sites must now use `CE_QUEUES_SITE` to define site-specific queues or redefine attributes of standard queues.''
     127
     128
     129=== PBS/Torque ===
     130
     131PBS/Torque related templates support the following variables :
     132
     133 * `TORQUE_SERVER_HOST`: name of Torque server. Defaults to `CE_HOST`.
     134 * `TORQUE_SERVER_PRIV_HOST`: alternate name of Torque server on the private network if any. Defaults to `CE_PRIV_HOST`.
     135 * `TORQUE_SUBMIT_FILTER` : this variable allow to redefine the script used as a Torque submit filter. A default filter is provided in standard templates.
     136 * `TORQUE_TMPDIR` : normally defined to refer to the working area created by Torque for each job, on a local filesystem. Define as `null` if you don't want job current directory to be redefined to this directory.
     137 * `TORQUE_SERVER_ATTRS` : nlist allowing to customize all server-related Torque parameters. For the complete list of supported parameters and default values, look at [source:templates/trunk/grid/glite-3.1/common/torque2/server/config.tpl common/torque2/server/config.tpl]. To undefine an attribute defined by default, define it to `undef`.
     138 * `WN_ATTRS` : this variable is a nlist with one entry per worker node (key is the node fullname). Each value is a nlist consisting in a set of PBS/Torque attribute to set on the node. Values are any `key=value` supported by `qmgr set server` command. One useful value is `state=offline` to cause a specific node to drain or `state=online` to reenable the node (suppressing `state=offline` is not enough to reenable the node). One specific entry in `WN_ATTRS` is `DEFAULT` : this entry is applied to any node that doesn't have a specific entry. If you want to avoïd re-enabling a node explicitly, you can have the `DEFAULT` entry be defined with the `state=free` arguments. For instance, you could define :
     139{{{
     140variable WN_ATTRS ?= nlist(
     141    "DEFAULT", nlist("state","free"),
     142    "mynode.mydomain.com", nlist("state","offline")
     143);
     144}}}
     145 * `WN_CPUS_DEF` : default number of CPU per worker node.
     146 * `WN_CPUS` : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default.
     147 * `WN_CPU_SLOTS` : number of job slot (Torque processors) to create per physical CPU. Default is 2 to allow both a normal job slot and a standing reservation reserved for short deadling jobs.
     148
     149For more details about all of these variables, their format and their default values, look at template defining [source:templates/trunk/grid/glite-3.1/defaults/glite/config.tpl default values] for gLite related variables.
     150
     151=== MAUI ===
     152
     153MAUI is configured using the following variables :
     154
     155 * `MAUI_SERVER_CONFIG` : nlist defining site-specific value for MAUI server base parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at [source:templates/trunk/grid/glite-3.1/common/maui/server/config.tpl common/maui/server/config.tpl] for a list of supported parameters.
     156 * `MAUI_SERVER_POLICY` : nlist defining site-specific value for MAUI server scheduling policy parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at [source:templates/trunk/grid/glite-3.1/common/maui/server/config.tpl common/maui/server/config.tpl] for a list of supported parameters.
     157 * `MAUI_SERVER_RMCFG` : nlist defining site-specific value for MAUI server resource manager configuration parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at [source:templates/trunk/grid/glite-3.1/common/maui/server/config.tpl common/maui/server/config.tpl] for a list of supported parameters.
     158 * `MAUI_GROUP_PARAMS` : nlist defining group-specific parameters. Valid values are anything accepted by MAUI configuration directive `GROUPCFG`. Key is either a group (VO) name or `DEFAULT`. Default entry is applied to all groups (VOs) defined but without an explicit entry.
     159 * `MAUI_USER_PARAMS` : nlist defining user-specific parameters. Valid values are anything accepted by MAUI configuration directive `USERCFG`. Key must be a user name.
     160
     161 * `MAUI_CLASS_PARAMS` : nlist defining class-specific parameters. Valid values are anything accepted by MAUI configuration directive `CLASSCFG`. Key is either a class name or `DEFAULT`. Default entry is applied to all classes defined but without an explicit entry.
     162 * `MAUI_ACCOUNT_PARAMS` : nlist defining account-specific parameters. Valid values are anything accepted by MAUI configuration directive `ACCOUNTCFG`. Key must be a account name.
     163 * `MAUI_NODE_PARAMS` : nlist defining node-specific parameters. Keys must match worker node names or be `DEFAULT`. Values must be a nlist where keys are any valid keyworkds accepted by MAUI configuration directive `NODECFG` and values the value for the corresponding keyword.
     164 * `MAUI_STANDING_RESERVATION_ENABLED` : boolean value defining if creation of 1 standing reservation per node is enabled or not. Default : true. Note that use of this feature requires a proper setting of variable `WN_CPU_SLOT` (normally 2).
     165 * `MAUI_STANDING_RESERVATION_CLASSES` : nlist defining classes which may access standing reservations. Key must be either a WN name or `DEFAULT`. Default entry is applied to all WNs without an explicit entry. Value must be a comma-separated list of classes.
     166 * `MAUI_WN_PART_DEF` : default node partition to use with worker nodes.
     167 * `MAUI_WN_PART` : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node.
     168 * `MAUI_GROUP_PART : nlist defining partitions whose access is allowed on a per group (VO) basis. Key is either a group (VO) name or `DEFAULT`. Default entry is applied to all groups (VOs) defined but without an explicit entry. If not defined, defaults to `MAUI_WN_PART_DEF`.
     169 * `MAUI_SERVER_CONFIG_SITE`: string containing literal MAUI configuration that must be included into final MAUI configuration, in addition to configuration provided by other variables.
     170 * `MAUI_MONITORING_TEMPLATE`: template name to use to configure MAUI monitoring script (normally a cron job). The default value should be appropriate. To disable it set the value to `null` but be aware of possible effects on CE publication (if using cache mode).
     171 * `MAUI_MONITORING_FREQUENCY` : frequency of checks that MAUI daemon is running and responding. Default is 15 minutes. In case of MAUI instability, can be lowered to limit impact on CE behaviour. Format is cron frequency format.
     172
     173''Note: if `MAUI_CONFIG` variable is defined, the content of this variable must contain the full content of `maui.cfg` file and variables `MAUI_SERVER_CONFIG`, `MAUI_SERVER_POLICY` and `MAUI_RMCFG` are ignored.''
     174
     175In addition to the variable to configure MAUI itself, there is one variable related to resource publishing into the BDII. See [#CEBDII specific section].
     176
     177=== RSH and SSH Configuration ===
     178
     179By default Quattor doesn't configure any RSH or SSH trust relationship between CE and WNs if home directories are on a shared filesystem declared in variable `WN_SHARED_AREAS`. Else it configures SSH with host-based authentication. By default RSH is always configured with an empty `hosts.equiv` file.
     180
     181If this doesn't fit your needs, you can explicitly control RSH and SSH configuration with the following variables :
     182 * `CE_USE_SSH` : if `undef` (default), configuration is based on use of a shared filesystem for home directories. Else it explicitly set whether to configure SSH host-based authentication (`true`) or not (`false`).
     183 * `SSH_HOSTBASED_AUTH_LOCAL` : when this variable is true and `CE_USE_SSH` is false, configure SSH host-based authentication on each WN restricted to the current WN (ability to use SSH without entering a password only for ssh to the current WN). This is sometimes required by some specific software.
     184 * `RSH_HOSTS_EQUIV` : If true, `/etc/hosts.equiv` is created with an entry for the CE and each WN. If false an empty `/etc/hosts.equiv` is created. If `undef`, nothing is done. Default is `undef`.
     185 * `SSH_DAEMON_SITE_CONFIG`: if defined, must be a nlist containing a valid list of `sshd` options. if `null`, default configuration for `sshd` is not defined and the site must build the configuration with a site-specific method.
     186
     187=== CE Publishing into BDII === #CEBDII
     188
     189When using Torque/MAUI, the default plugin provided with gLite to retrieve the number of job slots configured and the number of free slots is using Torque. This doesn't allow to reflect correctly a configuration where advanced MAUI features like ''standing reservations'' are used. An alternative plugin, based on MAUI, is available and distributed with QWG templates (even though it is totally independent). To use this MAUI-based plugin instead of the Torque-based one, define the following variable in your [source:templates/trunk/sites/example/site/glite/config.tpl gLite parameters] (this variable is ignored if the LRMS used in not Torque):
     190{{{
     191variable GIP_CE_USE_MAUI ?= true;
     192}}}
     193
     194This variable is `true` by default in QWG templates. Set it to `false` if you want to use the standard plugin.
     195
     196Another specific feature provided by QWG templates with respect to CE publishing into the BDII is the ability to run plugins in charge of updating CE dynamic information as a cron job on the LRMS host and to cache their outputs for later use by GIP itself. This is generally necessary in a multiple CE configuration and this is mandatory with MAUI-based plugins when using Torque/MAUI as MAUI commands can be executed only on the MAUI server. This ''cache mode'' is also lowering the polling rate on the batch system and protects again temporary failure of the LRMS to respond to the inquiry command (this is quite usual with MAUI when it is overloaded). To activiate this feature, you need to define the following variable in your [source:templates/trunk/sites/example/site/glite/config.tpl gLite parameters]:
     197{{{
     198variable GIP_CE_USE_CACHE ?= true;
     199}}}
     200
     201This variable default depends on the number of CE configured. When there is only one CE, it is `false` for backward compatibility, else it is `true`. But it is recommended to set it to `true` inconditionally.
     202
     203''Note: cache mode, even though it is essentially independent of the LRMS, is currently implemented only for MAUI. Defining this variable for unsupported LRMS has no effect.''
     204
     205
     206
     207
     208=== CE Status ===
     209
     210CE related templates use variable `CE_STATUS` to control CE state. Supported values are :
     211 * `Production` : this is the normal state. CE receives and processes jobs.
     212 * `Draining` : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them).
     213 * `Closed` : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete.
     214 * `Queuing` : CE accepts new jobs but will not execute them.
     215
     216`CE_STATUS` indicates the desired status of the CE. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is `Production`. This variable can be used in conjunction to [wiki:Doc/LCG2/TemplateLayout#PBSTorque WN_ATTRS] to drain queues and/or nodes.
     217
     218
     219=== Restarting LRMS Client ===
     220
     221It is possible to force a restart of LRMS (batch system) client on all WNs by defining variable `LRMS_CLIENT_RESTART`. This variable, if present, must be a nlist with one entry per WN to restart (key is the WN name) or 'DEFAULT' for all WNS without a specific entry. When the value is changed (or first defined), this triggers a LRMS client restart. The value itself is not relevant but it is advised to use a timestamp for better tracking of forced restart.
     222
     223For example to force a restart on all WNs, you can add the following definition :
     224{{{
     225variable LRMS_CLIENT_RESTART = nlist(
     226  'DEFAULT', '2007-03-24:18:33',
     227);
     228}}}
     229
     230A good place to define this variable is template `pro_site_cluster_info` in cluster `site` directory.
     231
     232'''Note : this feature is currently implemented only for Torque v2 client.'''
     233
     234=== Run-Time Environment ===
     235
     236gLite 3.0 templates introduce a new way to define `GlueHostApplicationSoftwareRunTimeEnvironment`. Previously it was necessary to define a list of all tags in the site configuration template. As most of these tags are standard tags attached to a release of the middleware, there is now a default list of tags defined in the default configuration site template, [source:templates/trunk/grid/glite-3.0.0/defaults/site.tpl defaults/site.tpl]. To supplement this list with tags specific to the site (e.g. `LCG_SC3`), define a variable `CE_RUNTIMEENV_SITE` instead of defining `CE_RUNTIMEENV` :
     237{{{
     238variable CE_RUNTIMEENV_SITE = list("LCG_SC3");
     239}}}
     240
     241''Note: if `CE_RUNTIMEENV` is defined in the site configuration template, this value will be used and must supply all the standard tags in addition to site-specific ones.''
     242
     243
     244=== Working Area on Torque WNs ===
     245
     246By default, QWG templates configure Torque client on WNs to define environment variable `TMPDIR` and location of `stdin`, `stdout` and `stderr` to a directory local to the worker node (`/var/spool/pbs/tmpdir`) and define environment variable `EDG_WL_SCRATCH` to `TMPDIR` (except for jobs requiring several WNs, e.g. MPI). This configuration is particularly adapted to shared home directories but works well with non shared home directories too.
     247
     248The main requirement is to appropriately size `/var` on the WNs as jobs sometimes require a large scratch area. On the other hand, `/home` is not required to be very large, as it should not store very large files for a long period. It is strongly recommended to use shared home directories, served through NFS or another distributed file system, as it optimizes `/home` usage and allows to dedicate local disk space on WNs to `/var`.
     249
     250If your configuration cannot be set as recommended or if you current configuration has a large space in /home and a limited space in /var, you can define the following property in your WN profiles before including `machine-types/wn` :
     251{{{
     252variable TORQUE_TMPDIR = /home/pbs/tmpdir";
     253}}}
     254
     255=== Restricting Access to CEs ===
     256
     257It is possible to ban some users or restrict time slots when the CEs are open for grid usage using LCAS middleware component. QWG allows to easily [#LCAS-LCMAPS configure them].
     258
     259=== Home Directory Purging ===
     260
     261A cron job is responsible for purging directories created for each job under the user home directory. By default, this job runs twice a week (on Sunday and Wednesday) and removes any file and directories older than 15 days in the home directory. This can be tuned with the following variables:
     262 * `CE_CLEANUP_ACCOUNTS_IDLE`: minimum age of a file (in days) for the file to be purged (Default: 15).
     263 * `CE_CLEANUP_ACCOUNTS_DAYS`: a comma-separated list of days in cron format (day number or first three letters of the day name) when to run the cron job (Default: Sunday, Wednesday).
     264
     265== WN Configuration ==
     266
     267__Base template__ :
     268 * DPM : `machine-types/wn`.
     269
     270WN configuration is derived from CE and batch system configuration. To configure your WN for specific local requirements, use variable `WN_CONFIG_SITE`` which must be a template with all the specific actions required on your local nodes.
     271
     272=== WN Profile Cloning ===
     273
     274QWG templates support profile cloning, a feature known as ''dummy WN'' that allows to speed up dramatically compilation of WNs. This is based on the fact that all WNs generally share the same configuration information, except for the hardware description and some parameters like network configuration.... With profile cloning, instead of compiling separately all WNs (and generally rebuild all of them when one dependency was modified, only one profile, called the ''exact node'' and used as a reference profile, is really compiled. On the other WNs, even though the source profile looks the same, the compilation is not done but instead the reference profile is included in the WN profile and a very small part of the configuration is replayed to do the actual node customization.
     275
     276''Note: disk partitioning and file system configuration are not replayed on each node. This means that the reference profile and the other nodes configured to use cloning must have a similar disk configuration.''
     277
     278The main variable to enable profile cloning (currently supported only for gLite WN) is `USE_DUMMY`. It is `false` by default and must be set to `true` to enable profile cloning. This variable must be defined to `true` on all WNs, including the reference one (''exact node'').
     279
     280To use profile cloning, in additon to enable its use, you need to define a set of variables, generally in a common profile called by all WNs. This is generally done by creating a site-specific machine type for the WN (typically in `sites/xxx/machine-types/xxx/wn`, be sure not to overload standard `machine-types/wn`) that will do all the necessary initializations and include the standard `machine-types/wn`.
     281
     282The variables you need to define for profile cloning to work are:
     283{{{
     284# Prefix of template names for all profiles. When using the old naming convention `profile_xxx`, define to `profile_`
     285variable PROFILE_PREFIX ?= '';
     286# Name of the reference profile (string after PROFILE_PREFIX in its profile name)
     287variable EXACT_NODE ?= 'grid100';
     288# Regexp (perl compatible) matching the node name part of profile names of eligible nodes
     289variable NODE_REGEXP ?= 'grid.*';
     290# Variable pointing to some site-specific templates. Customize to match your configuration.
     291variable SITE_DATABASES ?= 'site/databases';
     292variable GLOBAL_VARIABLES ?= 'site/global_variables';
     293variable SITE_FUNCTION ?= 'site/functions';
     294variable SITE_CONFIG ?= 'site/config';
     295variable FILESYSTEM_CONFIG_SITE ?= "filesystem/config";
     296variable GLITE_SITE_PARAMS ?= "site/glite/config";
     297}}}
     298
     299In addition to these variable definitions, another variable `WN_DUMMY_DISABLED` is available. This is a nlist where the key is an escaped node name and the value must be `true` to disable the use of profile cloning on a specific node. This allows to add `USE_DUMMY` variable the site-specific machine type definition for a WN, with a default value of `true`. And then, editing just one template (rather than editing each profile template individually), control the specific nodes where you want to disable the use of profile cloning. `WN_DUMMY_DISABLED` is typically defined in a site-specific template like `site/wn-cloning-config`, that is included in the site-specific definition of a WN.
     300
     301
     302== SE Configuration ==
     303
     304__Base template__ :
     305 * DPM : `machine-types/se_dpm`.
     306 * dCache : `machine-types/se_dCache`.
     307
     308''Note : This section covers the generic SE configuration, not a specific implementation.''
     309
     310=== List of site SEs ===
     311
     312The list of SEs available at your site must be defined in variable `SE_HOSTS`. This variable is a nlist with one entry for each local SE. The key is the SE host name and the value is a nlist defining SE parameters.
     313
     314Supported parameters for each SE are :
     315 * `type` : define SE implementation. Must be `SE_Classic`, `SE_dCache` or `SE_DPM`. This parameter is required and has no default. Note that SE Classic is deprecated.
     316 * `accessPoint` : define the root path of any VO-specific area on the SE.  This parameter is required with Classic SE and dCache. It is optional with DPM where it defaults to `/dpm/dom.ain.name/homes`.
     317 * `arch` : used to define `GlueSEArchitecture` for the SE. This parameter is optional and defaults to `multidisk` that should be appropriate for standard configurations.
     318
     319For more details, look at [source:templates/trunk/sites/example/site/site/glite/config.tpl example] and comments in [source:/templates/trunk/grid/glite-3.0.0/defaults/glite.tpl gLite defaults].
     320
     321''Note : Format of `SE_HOSTS` has been changed in gLite-3.0.2-11 release of QWG templates. Look at [wiki:ReleaseNotes/gLite-3.0#gLite-3.0.2-11:SE_HOSTSformatchange release notes] to know how to migrate from previous format.''
     322
     323
     324=== CE Close SEs ===
     325
     326Variable `CE_CLOSE_SE_LIST` defines the SEs that must be registered in BDII as a close SE for the current CE. It can be either a value used for every VO or a nlist with a default value (key is `DEFAULT`) and one entry per VO with a different close SE (key is the VO name). Each value may be a string if there is only one close SE or a list of SEs.
     327
     328`CE_CLOSE_SE_LIST` defaults to deprecated `SE_HOST_DEFAULT` if defined, else to all the SEs defined in SE_HOSTS variable.
     329
     330It is valid to have no close SE defined. To remove default definition, you need to do :
     331{{{
     332variable CE_CLOSE_SE_LIST = nlist('DEFAULT', undef);
     333}}}
     334
     335It is valid for the close SE to be outside your site but this is probably not recommended for standard configurations.
     336
     337=== Default SE ===
     338
     339Variable `CE_DEFAULT_SE` is used to define the default SE for the site. It can be either a SE name or a nlist with a default entry (key is `DEFAULT`) and one entry per VO with a different default SE (key is the VO name).
     340
     341By default, if not explicitly defined, it defaults to the first SE in the appropriate CE_CLOSE_SE_LIST entry. The default SE can be outside your site (probably not recommended for standard configurations).
     342
     343== DPM Configuration ==
     344
     345DPM-related standard templates require a site template to describe site/SE configuration for DPM. The variable `DPM_CONFIG_SITE` must contain the name of this template. This template defines the whole DPM configuration, including all disk servers used and is used to configure all the machines part of the DPM configuration.
     346
     347On DPM head node (in the node profile), variable `SEDPM_SRM_SERVER` must be defined to `true`. This variable is `false` by default (DPM disk servers).
     348
     349If you want to use Oracle version of DPM server define the following variable in your machine profile :
     350{{{
     351variable DPM_SERVER_MYSQL = false;
     352}}}
     353
     354
     355=== DPM site parameters ===
     356
     357There is no default template provided for DPM configuration. To build your own template, you can look at template [source:templates/trunk/sites/example/site/pro_se_dpm_config.tpl pro_se_dpm_config.tpl] in examples provided with QWG templates.
     358
     359Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You '''must''' provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error :
     360{{{
     361[pan-compile] *** wrong argument: operator + operand 1: not a property: element
     362}}}
     363
     364If you want to use a specific VO list on your DPM server and you have several nodes in your DPM configuration (DPM head node + disk servers), you need to write a template defining `VOS` variable (with a non default value) and define variable `NODE_VO_CONFIG` to this template in the profile of DPM nodes (both head node and disk servers).
     365
     366
     367=== Using non-standard port numbers ===
     368
     369It is possible to use non-standard port numbers for DPM daemons `dpm`, `dpns` and all SRM daemons. To do this, you  need to define the `XXX_PORT` variable corresponding to the service in your gLite site parameters. Look at gLite [source:templates/trunk/grid/glite-3.0.0/defaults/glite.tpl default parameters] to find the exact name of the variable.
     370
     371''Note: this is not recommended to change the port number used by DPM services in normal circumstances.''
     372
     373=== Using a non-standard account name for dpmmgr ===
     374
     375If you want to use an account name different from `dpmmgr` to run DPM daemons, you need to define variable `DPM_DAEMON_USER` in your site configuration template and provide a template to create this account, based on [source:templates/trunk/grid/gLite-3.0.0/users/dpmmgr.tpl users/dpmmgr.tpl].
     376
     377
     378== LFC Configuration ==
     379
     380__Base template__ : `machine-types/lfc`.
     381
     382LFC related standard templates require a site template to describe the service site configuration. The variable `LFC_CONFIG_SITE` must contain the name of this template.
     383
     384If you want to use Oracle version of LFC server define the following variable in your machine profile :
     385{{{
     386variable LFC_SERVER_MYSQL = false;
     387}}}
     388
     389LFC templates allow a LFC server to act as a central LFC server (registered in BDII) for some VOs and as a local LFC server for the others. This are 2 variables controlling what is registered in the BDII :
     390 * `LFC_CENTRAL_VOS` : list of VOs for which the LFC server must be registered in BDII as a central server. Default is an empty list.
     391 * `LFC_LOCAL_VOS` : list all VOs for which the server must be registered in BDII as a local server. Default to all supported VOs (`VOS`variable). If a VO is in both lists, it is removed from `LFC_LOCAL_VOS`. If you don't want this server to be registered as a local server for any VO, even if configured on this node (present in `VOS` list), you must define this variable as an empty list :
     392{{{
     393variable LFC_LOCAL_VOS = list();
     394}}}
     395
     396VOs listed in both lists must be present in `VOS` variable. These 2 variables have no impact on GSI (security) configuration and don't control access to the server. If you want to have `VOS` variable (controlling access to the server) matching the list of VOs supported by the LFC server (either as central or local catalogues), you can add the following definition to your LFC server profile :
     397{{{
     398variable VOS = merge(LFC_CENTRAL_VOS, LFC_LOCAL_VOS);
     399}}}
     400
     401=== LFC site parameters ===
     402
     403Normally the only thing really required in this site-specific template is the password for LFC user (by default `lfc`) and the DB accounts. Look at standard LFC [source:templates/trunk/glite-3.0.0/glite/lfc/config] configuration template for the syntax.
     404
     405Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You '''MUST''' provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error :
     406{{{
     407[pan-compile] *** wrong argument: operator + operand 1: not a property: element
     408}}}
     409
     410
     411=== LFC Alias ===
     412
     413It is possible to configure a LFC server to register itself into the BDII using a DNS alias rather than the host name. To achieve this, you need to define in your site parameters a variable `LFC_HOSTS` (replacement for former `LFC_HOST`) which must be a nlist where keys are LFC server names and values are nlist accepting the following parameters :
     414
     415 * `alias` : DNS alias to use to register this LFC server into the BDII
     416 
     417=== Using non-standard port numbers ===
     418
     419It is possible to use non-standard port numbers for LFC daemons. To do this, you only need to define the `XXX_PORT` variable corresponding to the service. Look at gLite [source:templates/trunk/grid/glite-3.0.0/defaults/glite.tpl default parameters] to find the exact name of the variable.
     420
     421''Note: this is not recommended to change the port number used by LFC services in normal circumstances.''
     422
     423=== Using a non-standard account name for lfcmgr ===
     424
     425If you want to use an account name different from `lfcmgr` to run LFC daemons, you need to define variable `DPM_USER` in your site configuration template and provide a template to create this account, based on [source:templates/trunk/grid/gLite-3.0.0/users/lfcmgr.tpl users/lfcmgr.tpl].
     426
     427
     428== WMS and LB ==
     429
     430__Base templates__ :
     431 * `machine-types/wms` : a WMS only node.
     432 * `machine-types/lb` : a LB only node.
     433 * `machine-types/wmslb` : a combined WMS/LB node (not recommended).
     434 
     435WMS and LB are 2 inter-related services : a complete WMS is made of at least one WMS and 1 LB. For scalability reasons, it is recommended to run WMS and LB on several machines : 1 LB should scale to 1M+ jobs per day where 1 WMS scales only to 20 Kjobs per day. Several WMS can share the same LB. Don't expect a combined WMS/LB to scale to more than 10 Kjobs/day. And be aware that a WMS needs a lot of memory: 4 GB is the required minimum.
     436
     437WMS and LB site-specific configuration is normally kept in one template, even if they run on several machines, to maintain consistency. Variable `WMS_CONFIG_SITE` must be defined to the name of this template, even for a LB. If you want to use a separate template to configure LB (not recommended), you can also use LB-specific variable, `LB_CONFIG_SITE`.
     438
     439List of VOs supported by WMS, if not your default list as defined in your [source:templates/trunk/sites/example/site/site/glite/config.tpl site-specific parameters], must be defined in another template that will be included very early in the configuration. Variable `NODE_VO_CONFIG` must be defined to the name of this template. This template generally contains only variable `VOS` definition.
     440
     441Main variables that need to be customized according to your WMS and LB configuration are :
     442 * `LB_MYSQL_ADMINPWD` : password of MySQL administrator account. There is no default, be sure to define to a non empty string.
     443 * `LB_TRUSTED_WMS` : a list of DN matching host DN of all WMS allowed to use this LB. May remain empty (default) on a combined WMS/LB.
     444 * `WMS_LB_SERVER_HOST` : define LB used by this WMS. Keep default value on a combined WMS/LB.
     445
     446In addition to these variables, there are several variables to tune performances of WMS, in particular its load monitor subsystem. Look at [source:templates/trunk/grid/glite-3.0.0/glite/wms/config.tpl glite/wms/config.tpl] and templates provided with `ncm-wmslb` component for a list of all available variables. The defaults should be appropriate; avoid modifying these variables without a clear reason to do so. In particular avoid setting too high thresholds as it may lead to WMS machine to be very much overloaded and service response time to be very bad. Most of the variables are related to the WM component of WMS. The main ones are:
     447 * `WMS_WM_EXPIRY_PERIOD` : maximum time in seconds to retry match making in case of failure to find a resource compatible with requirements. Default: 2 hours.
     448 * `WMS_WM_MATCH_RETRY_PERIOD` : Interval in seconds between 2 match making attempts. Must be less than `WMS_WM_EXPIRY_PERIOD`. Default : 30 mn.
     449 * `WMS_WM_BDII_FILTER_MAX_VOS` : maximum number of VOs configured on the WMS to define a LDAP filter when querying the BDII. Default: 10.
     450 * `WMS_WMPROXY_SDJ_REQUIREMENT` : match making requirement to add when `ShortDeadlineJob=true` in JDL. The same requirement is added negated for non SDJ jobs. Default should be appropriate (every queue whose name ends with `sdj`).
     451
     452=== Load Monitor ===
     453
     454WMS has an integrated feature to monitor load on the machine it runs on and refuse to accept new jobs if the load is higher than defined thresholds. Available variables to define threshold are :
     455 * `WMS_LOAD_MONITOR_CPU_LOAD1` :  maximum CPU load averaged on 1 minute (as defined by `top` or `xload`). Default : 10.
     456 * `WMS_LOAD_MONITOR_CPU_LOAD5` : maximum CPU load averaged on 1 minute (as defined by `top` or `xload`). Default : 10.
     457 * `WMS_LOAD_MONITOR_CPU_LOAD15` :  maximum CPU load averaged on 1 minute (as defined by `top` or `xload`). Default : 10.
     458 * `WMS_LOAD_MONITOR_DISK_USAGE` : maximum usage (in percent) of any file system present on the machine. Default : 95 (%).
     459 * `WMS_LOAD_MONITOR_FD_MIN` :  minimum number of free file descriptors. Default : 500.
     460 * `WMS_LOAD_MONITOR_MEMORY_USAGE` :  maximum usage (in percent) of virtual memory. Default : 95 (%).
     461
     462=== Draining a WMS ===
     463
     464It is sometimes desirable to drain a WMS. When draining a WMS doesn't accept any request to submit new jobs but continues to process already submitted jobs and accepts requests about job status or to cancel a job.
     465
     466With QWG, a WMS can be drain by defining in its profile the variable `WMS_DRAINED` to `true`. Undefining the variable reenable the WMS. Note that if you drain it manually and reconfigure the WMS with Quattor, it is re-enabled.
     467
     468=== WMS Client Configuration ===
     469
     470A few variables allow to configure default settings of WMS clients:
     471 * `WMS_OUTPUT_STORAGE_DEFAULT`: default directory where to put job outputs (one directory per job will be created in this directory). Default: `${HOME}/JobOutput`.
     472 
     473== BDII ==
     474
     475__Base template__ : `machine-types/bdii`.
     476
     477QWG Templates support configuration of all types of BDII :
     478 * Top-level BDII (default type) : use a central location to get their data (all BDIIs use the same source). This central location contains information about all sites registered in the GOC DB. Use of FCR (Freedom of Choice) enabled by default.
     479 * Site BDII : BDII in charge of collecting information about site resources. Support the concept of sub-site BDII (hierarchy of BDII to collect site information).
     480 * Resource BDII : used in replacement of Globus MDS to publish resource information into BDII.
     481
     482When configuring BDII on a machine, the following variables can be used (in the machine profile or in a site-specific template) to tune the configuration :
     483 * `BDII_TYPE` : can be `resource`, `site`, `top`. `top` is the default, except if deprecated variable `SITE_BDII` is true.
     484 * `BDII_SUBSITE` : name of the BDII sub-site. Ignored on any BDII type except `site`. Must be empty for the main site BDII (default) or defined to the sub-site name if this is a subsite BDII.
     485 * `BDII_SUBSITE_ONLY` (gLite 3.1 only) : if false, allow to run both subsite and site BDII on the same machine. Default : true.
     486 * `BDII_USE_FCR` : set to false to disable use of FCR (Freedom of Choice) on top-level BDII or to true to force its use on other BDII types. This value is ignored if BDII type is not `top`. Default is `true`.
     487 * `BDII_FCR_URL` : use a non-standard source for FCR.
     488
     489Starting with QWG templates [milestone:gLite-3.0.2-13 gLite-3.0.2-13], all machine types publishing information into BDII (almost all except WN, UI and disk servers) are using a BDII configured as a ''resource BDII'' for this purpose. In addition all these machine types can also be configured as a [#siteBDII site]/[#subsiteBDII subsite] BDII by defining appropriate variable into node profile (`BDII_TYPE='site'` and if applicable `BDII_SUBSITE`).
     490
     491''Note : combined BDII used to be the default on LCG CE for backward compatibility but this is no longer the case. It is advised to run site BDII preferably on a dedicated machine. If this is not possible, choose any machine type but the CE as this machine can be very loaded and site BDII may become unresponsive with a lot of side effects.''
     492
     493=== Configuring BDII URLs on a site BDII === #siteBDII
     494
     495A site BDII aggregates information published by several other BDIIs, typically resource BDIIs or subsite BDIIs. List of resources to aggregate are specified by the variable `BDII_URLS`. This variable is typically defined in site parameters, [source:templates/trunk/sites/example/site/site/glite/config.tpl site/glite/config.tpl], and is ignored on all nodes except a site (or combined) BDII.
     496
     497Variable `BDII_URLS` is a nlist of URLs corresponding to the resource BDII endpoints (urls) aggregated on the site BDII. The key is an arbitrary name (like `CE`, `DPM1`...) but must be unique and the value is the endpoint. See [source:templates/trunk/sites/example/site/site/glite/config.tpl site configuration] example.
     498
     499'''Important: site, subsite and top-level BDIIs run a resource BDII that publishes information about themselves. They must be added to the `BDII_URLS` variable.'''
     500
     501''__Restriction__ : each BDII in BDII hierarchy '''must''' use a different `mds-vo-name`. Thus it is not possible to use the `mds-vo-name` of a site BDII in `BDII_URLS` or this will be considered as a loop and the entry will be ignored.''
     502
     503=== Configuring a subsite BDII === #subsiteBDII
     504
     505It is possible to run a hierarchy of site BDII. This is particularly useful for a site made of several autonomous entities as it allows each subsite to export a unique access point to published subsite resources. Each subsite manages the actual configuration of its subsite BDII and all the subsites are then aggregated by the site BDII. GRIF site is an example of such a configuration.
     506
     507A subsite BDII is a site BDII where variable `BDII_SUBSITE` has been defined to a non empty value. This value is appended to site name to form the `mds-vo-name` for the subsite.
     508
     509When using an internal hierarchy of site and subsite BDIIs, `BDII_URLS` must be used for subsite BDIIs. To define the BDII endpoints that must be collected by the site BDII, you must use `BDII_URLS_SITE`. This allow both to coexist in the same site parameter template (typically [source:templates/trunk/sites/example/site/site/glite/config.tpl site/glite/config.tpl]) and both have the same syntax. `BDII_URLS_SITE` contains typically the endpoint of each resource BDII inside the site.
     510
     511When co-locating on the same machine a subsite BDII and a site BDII, this may lead to a problem with the `GlueSite` object: several objects could be published with a different DN, depending on the subsite BDII actually publishing it. This is particularly a problem if you run several subsite BDIIs also acting as a site BDII in different subsite as you will publish to the top BDII several different `GlueSite` object for your site. To solve this, it is possible to publish the `GlueSite` object in non-standard branch of the information tree, using the variable `SITE_GLUE_OBJECT_MDS_VO_NAME`. The value of this variable will be used instead of `resource`and thus the `GlueSite` object will be invisible on the resource BDII of the site BDII. To get the `GlueSite` object published by site BDII, it is necessary to add an entry in `BDII_URLS_SITE` for the ''active'' site BDII (using the DNS alias generally associated with the service) using the same `mds-vo-name` as specified in variable `SITE_GLUE_OBJECT_MDS_VO_NAME`.
     512
     513=== Defining Top-level BDII ===
     514
     515It is necessary to define the top-level BDII used by the site. This is done by variable `TOP_BDII_HOST`. This variable replaces deprecated `BDII_HOST`. It has no default.
     516
     517''Note : this is a good practice to use a DNS alias as the top-level BDII name. This allows to change the actual top-level BDII without editing configuration. This has the advantage that the change is taken into account by running jobs (if there is no DNS caching on the WNs).''
     518
     519=== Configuring BDII alias === #BDIIAlias
     520
     521When several BDIIs are used to provide the same BDII service (either top or site) in order provide service load balancing and/or failover, they are generally all associated with a DNS alias (`CNAME`). In this configuration, the endpoint published for the BDII should be the alias instead of the BDII host name (default). This is done by one of the following variables, depending on whether the BDII is a site BDII or a top BDII:
     522 * `BDII_ALIAS_TOP`: DNS name to use in the top BDII endpoint.
     523 * `BDII_ALIAS_SITE`: DNS name to use in the site BDII endpoint.
     524
     525== MPI Support ==
     526
     527To activate MPI support on the CE and WNs, you need to define variable `ENABLE_MPI` to `true` in your site parameters (normally `site/glite/config.tpl`). It is disabled by default.
     528
     529A default set of RPMs for various flavours of MPI (MPICH, MPICH2, OPENMPI, LAM) will be installed. If you would like to install a custom version of a particular MPI implementation, you can do so by defining the following variables:
     530 * MPI_<flavour>_VERSION : Version of the package (e.g. MPI_MPICH_VERSION = "1.0.4")
     531 * MPI_<flavour>_RELEASE : Release number of the package (e.g. MPI_MPICH_RELEASE = "1.sl3.cl.1")
     532 * MPI_<flavour>_EXTRAVERSION : Patch number of the package (if needed e.g. MPI_MPICH_EXTRAVERSION="p1")
     533
     534These variables ensure that the version published is consistent with the installed RPMs.
     535
     536
     537== FTS Client ==
     538
     539On machine types supporting it (e.g. UI, VOBOX, WN), you can configure a FTS client. Normally, to configure FTS client you only need to define variable `FTS_SERVER_HOST` to the name of your preferred FTS server (normally your ''closest T1'').
     540
     541To accommodate specific needs, there are 2 other variables whose default value should be appropriate :
     542 * `FTS_SERVER_PORT` : port number used by FTS server. Default : 8443.
     543 * `FTS_SERVER_TRANSFER_SERVICE_PATH : root path of transfer service on FTS server. This is used to build leftmost part of URLs related to FTS services. Default : `/glite-data-transfer-fts`.
     544
     545''Note : for backward compatibility, this is still possible to directly define variable `FTS_SERVER_URL`, even though it is recommended to change your site parameters and use the new variables instead.''
     546
     547== MonBox and APEL ==
     548
     549__Base template__ : `machine-types/mon`.
     550
     551MonBox is the service in charge of storing local accounting. It is used in conjunction with APEL, the framework in charge of collecting accounting data on the CEs and publishing them into the central accounting. APEL is made of 2 parts:
     552 * the parser: in charge of parsing batch system and globus accounting/log files, producing the normalized grid accounting data and storing them into the MonBox database. Normally running on the CE, there is one parser by type of batch system.
     553 * the publisher: in charge of publishing the local accounting data stored on the MonBox into the grid central accounting. Normally runs on the MonBox.
     554
     555MonBox requires the following configuration variables:
     556
     557 * `MON_MYSQL_PASSWORD`: password MySQL administrator (root) on MonBox.
     558 * `MON_HOST`: host name of MonBox.
     559
     560APEL configuration requires the following variables:
     561 * `APEL_ENABLED`: wheter to enable APEL. Default: `true`.
     562 * `APEL_DB_NAME`: APEL database name on MonBox. Default: `accounting`.
     563 * `APEL_DB_USER`: MySQL user to access APEL database on MonBox. Default: `accounting`.
     564 * `APEL_DB_PWD`: MySQL password to access APEL database on MonBox.
     565
     566By default, APEL publisher is run on MonBox. If you'd like to run it on another machine, add the following line in the machine profile:
     567{{{
     568include { 'common/accounting/apel/publisher' };
     569}}}
     570
     571''Note: even though APEL publisher is not run on a MonBox, it does require access to a MonBox.''
     572
     573After the initial installation of the machine, you need to install a certificate on the machine as the usual location (`/etc/grid-security`), except if you use an installation (AII) hook to do it during the installation. After doing it you need to run again manually the Quattor configuration module `ncm-rgmaserver` or to reboot the machine. To run the configuration module, use the following command:
     574{{{
     575ncm-ncd --configure rgmaserver
     576}}}
     577
     578== MyProxy Server ==
     579
     580__Base template__ : `machine-types/px`.
     581
     582MyProxy server configuration consists of defining policies for access to proxies stored on the server and their renewal. There are 2 sets of policiies : explicitly authorized policies and default policies. For each set a separate policy can be defined for:
     583 * renewers : list of clients able to renew a proxy. The variables to use are `MYPROXY_AUTHORIZED_RENEWERS` and `MYPROXY_DEFAULT_RENEWERS`.
     584 * retrievers : list of clients able to retrieve a proxy it they have valid credentials and provide the same username/password as the one used at proxy creation. The variables to use are `MYPROXY_AUTHORIZED_RETRIEVERS` and `MYPROXY_DEFAULT_RETRIEVERS`.
     585 * key retrievers : list of clients able to retrieve a proxy, including the private key, it they have valid credentials and provide the same username/password as the one used at proxy creation. The variables to use are `MYPROXY_AUTHORIZED_KEY_RETRIEVERS` and `MYPROXY_DEFAULT_KEY_RETRIEVERS`.
     586 * trusted retrievers : list of clients able to retrieve a proxy without providing valid credentials (but providing the same username/password as the one used at proxy creation if one was used). The variables to use are `MYPROXY_AUTHORIZED_TRUSTED_RETRIEVERS` and `MYPROXY_DEFAULT_TRUSTED_RETRIEVERS`. Clients listed in these variables are automatically added to the corresponding retrievers list (`MYPROXY_AUTHORIZED_RETRIEVERS` or `MYPROXY_DEFAULT_RETRIEVERS`).
     587
     588The list values must be client DNs or regexp matching a client DN. Regexp must be used with caution as they may result in giving a broader access than wanted. For more information about the different policies and the regexp syntax, see the manpage for MyProxy server configuration:
     589{{{
     590man myproxy-server.config
     591}}}
     592
     593In addition to the previous variable, it is possible to use variable `GRID_TRUSTED_BROKERS` to define the WMS which are allowed to use the MyProxy server. The list provided with this variable is merged with `MYPROXY_AUTHORIZED_RENEWERS`.
     594
     595== VOMS Server ==
     596
     597__Base template__ : `machine-types/voms`.
     598
     599VOMS server default configuration can be customized with the following variables:
     600 * `VOMS_VOS`: this variable describe each VO managed by the VOMS server. This is a nlist where the key is the VO name and the value a nlist specifiying the VO parameters. A typical entry is:
     601{{{
     602  'vo.lal.in2p3.fr',  nlist('port', '20000',
     603                            'host', 'grid12.lal.in2p3.fr',
     604                            'dbName', 'voms_lal',
     605                            'dbUser', 'root',
     606                            'dbPassword', 'clrtxtpwd',
     607                            'adminEmail', 'vomsadmins@example.com',
     608                            'adminCert', '/etc/grid-security/vomsadmin.pem',
     609                           ),
     610}}}
     611 * `VOMS_DB_TYPE`: can be `mysql` or `oracle`.
     612 * `VOMS_MYSQL_ADMINPWD`: password of the MySQL administrator account (MySQL account). Required if DB type is `mysql` (no default).
     613 * `VOMS_MYSQL_ADMINUSER`: username of the MySQL administrator account (MySQL account). Ignored if DB type is not `mysql`. Default: `root`.
     614 * `VOMS_ADMIN_SMTP_HOST`: SMTP host used by VOMS admin when sending emails. Default: `localhost`.
     615 * `VOMS_CRON_EMAIL`: user to notify in case of problems during cron jobs. Default: `root@localhost`.
     616
     617In addition to configuring the previous variable, it is generally necessary to install the certificate of the initial administrator of the VO. This certificate is passed in parameter `adminCert` in VO parameters (`VOMS_VOS`). This is typically done with Quattor configuration module `filecopy` in the site-specific configuration of the VOMS server. A typical sequence to do this is:
     618{{{
     619include { 'components/filecopy/config' };
     620variable CONTENTS = <<EOF;
     621-----BEGIN CERTIFICATE-----
     622... Copy certificate from the PEM file ...
     623-----END CERTIFICATE-----
     624EOF
     625
     626# Now actually add the file to the configuration.
     627'/software/components/filecopy/services' =
     628  npush(escape('/etc/grid-security/vomsadmin.pem'),
     629        nlist('config',CONTENTS,
     630              'perms','0755'));
     631}}}
     632
     633For more information on VOMS server configuration parameters, you may want to look at the VOMS server [https://edms.cern.ch/file/974094/1/voms-admin-user-guide.pdf administration guide].
     634
     635== VOBOX ==
     636
     637__Base template__ : `machine-types/vobox`.
     638
     639The VOBOX is a machine '''dedicated to one VO''' running VO-specific services. In addition to the VO-specific services, this machine runs a service called ''proxy renewal'' in charge of renewing the grid proxy used by VO-specific services.
     640
     641This is critical for the security to restrict the number of people allowed access to the VOBOX. By default, only people with the VO SW manager role can log into the VO box. To change this configuration, refer to section on [#MappingofVOMSgroupsrolesintogrid-mapfile VOMS groups/roles mapping], but be sure you really need to allow other roles as it can give unwanted users access to privilege services.
     642
     643The configuration templates for the VOBOX enforce there is only one VO configured for acess to VOBOX-specific services. This VO must be declared using the `VOS` variable, as for other machine types. If you want to give other VOs access to the VOBOX for the management and operation of the VOBOX, you need to explicitly allow them using the variable `VOBOX_OPERATION_VOS`. This variable is a list of VOs considered as operation VOs. By default, this list is only VO `ops`. If the VOs listed in this variable are not listed in  `VOS`, they are automatically added.
     644
     645Only the enabled VO has a `gsissh` access to the VOBOX by default. If you want the operation VOs to also be enabled for `gsissh` access to the VOBOX, you need to define variable `VOBOX_OPERATION_VOS_GSISSH` to `true` in the VOBOX profile. Only the FQAN enabled by [#MappingofVOMSgroupsrolesintogrid-mapfile VO_VOMS_FQAN_FILTER] will be enabled for each VO (default: SW manager).
     646
     647''Note: if you add `dteam` VO to operation VOs and enable `gsissh` access for operation VOs, be sure to restrict the people who will be allowed interactive access to the VOBOX, as `dteam` is a very large VO with people from every grid site.''
     648
     649There are some other variables available to tune the VOBOX configuration but the default should generally be appropriate. The main ones are:
     650 * `VOBOX_TCP_MAX_BUFFER_SIZE`: the maximum TCP buffer size to use. This is critical to reach good performances on high speed network. Default: 8388608.
     651 * `VOBOX_TCP_MAX_BACKLOG`: another critical TCP congestion control parameter to reach high throughput and good performances. Default: 250000.
     652
     653In addition, it is generally necessary to define the [#CustomizingDefaultEnvironment default MyProxy server] (`MYPROXY_DEFAULT_SERVER`).
     654
     655''Note: it is recommended not to define gsissh-related variables, as documented in the [#UI UI] section, as this may interfere with the standard VOBOX configuration. The only exception is `GSISSH_PORT`.''
     656
     657
     658== UI ==
     659
     660__Base template__ : `machine-types/ui`.
     661
     662UI may be run on a non-grid machine where the proposed base template is not suitable. In this case, if the machine is managed by Quattor, it is possible to reuse part of the base template on the target machine : mainly VO configuration, `glite/service/ui` and gLite updates.
     663
     664On a standard UI, user accounts must be created  using a method appropriate to the local site. It can be NIS, LDAP or using the template provided with QWG to manage [wiki:Doc/OS/UserMgt user creation].
     665
     666It is also possible to configure a UI to be accessed through gsissh. In this configuration, users use their grid certificate to authenticate on the UI and are mapped to a pool account of the VO. To configure a UI with gsissh, it is only necessary to define variable `GSISSH_SERVER_ENABLED` to `true` in the machine profile.
     667
     668When configuring a gsissh-enabled UI, there are a few specific variables available to customize gsissh server:
     669 * `UI_GSISSH_CONFIG_SITE`: name of a template to execute before configuring gsissh server. For everything related to VO configuration, be sure to use VO configuration variables as this is done before executing this template.
     670 * `GSISSH_SERVER_VOS`: subset of configured VOs on the node that must be enabled for gsissh access. Default: all configured VOs (`VOS`).
     671 * `GSISSH_PORT`: port used by gsissh server. Default: 1975.
     672
     673''Note: be aware that `gsisshd` is a an authenticated grid service and thus require the UI to have a server certificate, as any other grid service machine.''
     674 
     675=== Customizing Default Environment ===
     676
     677Main variables to customize environment seen by users on a UI are:
     678 * `MYPROXY_DEFAULT_SERVER`: name of default MyProxy server to use with `myproxy-xxx` commands.
     679 * Variables related to [wiki:Doc/gLite/TemplateCustomization#FTSClient FTS client].
     680