= gLite Template Customization = [[TracNav]] [[TOC(inline)]] Site customization to QWGtemplates is done through a small set of templates used to define variables used as input by QWG templates. This doesn't cover OS basic configurationt that is decribed in the page about [wiki:Doc/TemplateCustom template framework]. All site parameters related to QWG middleware are supposed to be declared in template `pro_lcg2_config_site.tpl`. To start a new site, import the site paramater template [source:templates/trunk/sites/example/site/pro_lcg2_config_site.tpl example]. The list of all available variables with their description and their default value can be consulted in template source:templates/trunk/grid/glite-3.0.0/defaults/site.tpl. '''This template is a critical part of standard templates and should not be modified or duplicated'''. ''Note : Information in this page may document features or configuration options not present in the current release. These information are related to changes and improvement that will be available in next release and are already present in the [source:templates/branches/gLite-3.0.0 current development branch]. If you are urgently requiring these features, [wiki:Download/QWGTemplates use content] of this branch.'' == Machine types == QWG templates provide a template per machine type (CE, SE, RB, ...). They are located in {{{machine-types}}} directory and are intended to be generic templates. No modification should be needed. To configure a specific machine with gLite middleware, you just need to include the appropriate machine type template into the machine profile, after specifying a template containing the specific configuration for this particular machine with the variable {{{xxx_CONFIG_SITE}}} (look in the template for the exact name of the variable). Here an example for configuring a Torque based CE : {{{ object template profile_grid10; # Define specific configuration for a GRIF CE to be added to # standard configuration variable CE_TORQUE_CONFIG_SITE = "pro_ce_torque_grif"; # Configure as a CE (Torque) + Site's BDII include pro_ce_torque; # # software repositories (should be last) # include repository_common; }}} In this example, {{{CE_TORQUE_CONFIG_SITE}}} specify the name of a template defining the Torque configuration. All the machine types share a common basic configuration, described in template `machine-types/base.tpl`. This template allows to add site specific configuration to this common basic configuration (e.g. configuration of a monitoring agent). This is done by defining variable {{{GLITE_BASE_CONFIG_SITE}}} to a template containing the site specific configuration to be added to the common configuration (at the end of the common configuration). This variable can be defined, for example, in template {{{pro_site_cluster_info.tpl}}}. == VO Configuration == List of VOs to configure on a specific node is defined in variable `VOS`. Generally a site-wide default value is defined in `pro_lcg2_config_site.tpl` (defined with operator `?=`). This value can be overidden on a specific machine by defining `VOS` variable in the machine profile, before including the machine type profile. An example of VOS definition is : {{{ variable VOS ?= list('alice', 'atlas', 'biomed', 'calice', 'cms', 'cppm', 'dteam', 'dzero', 'egeode', 'lhcb', 'ops', 'planck', ); }}} ''Note : `dteam` and `ops` are mandatory VOs.'' For each VO listed in `VOS`, there must be a template defining the VO parameters in `vo/params`. The template name must be the same as the VO name used in `VOS`. If the VO to be added has no template to define its parameters, refer to next section about adding a new VO. === Defining Site Specific Defaults for VOs === It is possible to define site specific defaults for VOs that override standard default. This must be done by defining variable `VOS_SITE_PARAMS` as a nlist. This nlist can contain one entry per VO plus an entry `DEFAULT`. Entry `DEFAULT` is used to define paramaters that will apply to all VOs, other entries apply only to one specific VO. The entry key is the VO name (except for `DEFAULT`), as used in `VOS` variable. Each entry value must be the name of a structure template or a nlist defining any of these properties : * `create_home` : Create home directories for VO accounts. Default defined by variable `CREATE_HOME` variable. * `create_keys` : Create SSH keys for VO accounts. Default defined by variable `CREATE_KEYS` variable. * `unlock_accounts` : a regexp defining host names where the VO accounts must be unlocked * `pool_digits` : define default number of digits to use when creating pool accounts * `pool_offset` : define offset from VO base uid for the first pool account * `pool_start` : index of the first pool accounts to create for a VO * `pool_size` : number of pool accounts to create by default for a VO * `sw_mgr_role` : description of VO software manager role. Avoid to change default. For example, to define a site specific RB for VO Alice, create a template `vo/site/alice.tpl` in your site directory like : {{{ structure template vo/site/alice; 'lbhost' = 'myrb.example.org:9000'; 'nshost' = 'myrb.example.org:7772'; }}} and add the following entry in `VOS_SITE_PARAMS` in your `pro_lcg2_config_site.tpl` : {{{ variable VOS_SITE_PARAMS = nlist ('alice', 'vo/site/alice', ); }}} Alternativly, you can define these parameters directly into `VOS_SITE_PARAMS` : {{{ variable VOS_SITE_PARAMS = nlist ('alice', nlist('lbhost' = 'myrb.example.org:9000', 'nshost' = 'myrb.example.org:7772', ), ); }}} === Adding a New VO === Adding a new VO involved the creation of a template defining VO parameters. This template name must be the name you use to refer to the VO in rest of the configuration but is not required to be the real VO name (can be an alias used in the configuration). This template must be located in directory `vo/params`, in one of your cluster or site specific hierarchy of templates or in gLite templates. ''Note : if you create a template for a new VO, be sure to commit it to the QWG repository if you have write access toit, or to send it to QWG developpers. There is normally no reason for a VO definition not to be generally available.'' To create a template to describe a new VO, the easiest is to copy the template for an already configured VO. The main variables supported in this template are : * `name` : VO official name. No default. * `account_prefix` : prefix to use when creating accounts for the VO. Generally the 3 first letters of the VO name. No default. * `voms_servers` : a nlist describing VOMS server used by the VO, if any. If the VO has several (redundant) VOMS servers, this property can be a list of nlist. For each VOMS server, supported properties are : * `name` : name of the VOMS server. This is a name used internally by template. By default, template defining VOMS server certificate has the same name. No default. * `host` : VOMS server host name. No default. * `port` : VOMS server port associated with this VO. No default. * `cert` : template name, in `vo/certs` , defining VOMS server certificate. If not specified, defaults to the VOMS server name. * `voms_roles` : list of VOMS roles supported by the VO. This property is optional. For each role, the entry is a nlist with the following possible properties : * `description` : description of the VO role. This property is informational, except for VO software manager where it must be "SW manager" * `name` : VO role name, as defined on the VOMS server * `suffix` : suffix to append to `account_prefix` to build account name associated with this role. * `proxy` : name of the proxy server used by the VO. No default, optional. * `nshosts` : name:port of the RB used by the VO (Network Server). No default. * `lbhosts` : name:port of the RB used by the VO (Logging and Bookeeping). No default. * `catalog` : define catalog type used by the VO. Optional. Must be defined only for VO still using `RLS` (value must be `rls` or `RLS`). * `base_uid` : first uid to use for the VO. * `create_home` : Create home directories for VO accounts. Default defined by variable `CREATE_HOME` variable. * `create_keys` : Create SSH keys for VO accounts. Default defined by variable `CREATE_KEYS` variable. * `gid` : GID associated with VO accounts. Default : first pool account UID. * `pool_size` : number of pool accounts to create for the VO. Defaults : 200. * `pool_digits` : number of digits to use for pool accounts. Must be large enough to handle `pool_size`. Default is 3. * `pool_offset` : define offset from VO base uid for the first pool account In addition to this template, you need to have another template defining the public key of the VOMS server used by the VO. This template has the name of the VOMS server by default. It can be explicitly defined with `cert`property of a VOMS server entry. If the new VO is using an already used VOMS server, there is no need to add the certificate. === VO Specific Areas === There are a couple of variables available to customize VO specific areas (software area, VO accounts home directories...) : * `VO_SW_AREAS` : a nlist with one entry per VO (key is the VO name as used in `VOS` variable). The value is a directory to use for the VO software area. Be sure to list this directory or its parent in `WN_SHARED_AREAS` if you want to use a shared filesystem for this area (this is highly recommended). Directories listed in this variable will be created with the appropriate permissions (`0755` for VO group). * `VO_HOMES` : a nlist with one entry per VO (key is the VO name as used in `VOS` variable). The value is a directory prefix to use when creating home directories for accounts. A suffix will be added to this name corresponding to the VO role suffix for role accounts or the the account number for pool accounts. By default, VO accounts are created in `/home`. * `VO_SWMGR_HOMES` : a nlist with one entry per VO (key is the VO name as used in `VOS` variable). The value is a directory to use as the home directory for the VO software manager. If there is not entry for a VO, VO_HOMES is used. Main purpose of this variable is to define home directory for the software manager as the VO software area. This can be achieved easily by assigning `VO_SW_AREAS` to this variable. * `CREATE_HOME` : this variable controls creation of VO accounts home directories. It accepts 3 values : `true`, `false` and `undef`. `undef` is a ''conditional true'' : home directories are not created if they reside on a NFS shared file system (it is listed in `WN_SHARED_AREAS`) and the NFS server is not the current machine. === Tuning VO configuration on a specific node === Each machine type templates define VO configuration (pool accounts, gridmap file/dir...) appropriate to the machine type. If you want to change this configuration, on a specific node, you can use the following variables : * `NODE_VO_ACCOUNTS` (boolean) : VO accounts must be created for each VO initialized. Default : true. * `NODE_VO_GRIDMAPDIR_CONFIG` (boolean) : gridmapdir entries must be initialized for pool accounts. Default : `false`. * `NODE_VO_WLCONFIG` (boolean) : initialize workload management environment for each VO. Normally enabled only on resource brokers. Default : false. * `NODE_VO_CREATEHOME` (boolean) : create home directories for pool accounts. Default : true. In addition you can execute actions specific to the local machine by defining the following variable (mainly used to define a VO list specific to a node by assigning a non default value to `VOS` variable) : * `NODE_VO_CONFIG` (string) : site specific template that must be included before actually doing VO intialization. Allow for specific VO modification to default VO configuration. Default : none. '''Note : before modifying default VO configuration for a specific machine, be sure what you want to do is valid. Misconfiguring VO can have dramatic effects on service availability.''' == Allocation of Service Accounts == Some services allow to define a specific account to be used to run the service. In this case, there is one template for each of these accounts in `common/users`. The name of the template generally matches the user account created or, when the template is empty, the name of the service. A site can redefine account names or characteristics (uid, home directory...). To do this, you should not edit directly the standard templates as the changes will be lost in the next version of the template (or you will have to redo them by hand). You should create a `users` directory somewhere in your site or cluster hierarchy (e.g. under the `site` directory, not directly at the same level else it will not work without adjusting `cluster.build.properties`) and put your customized version of the template here. '''Note : don't change the name of the template, even if you change the name of the account used''' (else you'll need to modify standard templates needing this user). == Accepted CAs == There is one template defining all the accepted CAs. We generally produced a new one each time there is a new release of the list of CAs officially accepted by EGEE. If you need to adjust it, create a site or cluster specific copy of `common/security/cas.tpl` in a directory `common/security`. If you need to update this template, refer to the standard [wiki:Development/AutoTemplates#TrustedCAsTemplate procedure] to do it. == Shared File Systems == It is recommended to use a shared file system mounted (at least) on CE and WNs for VO software areas. It is also sometimes convenient to use a shared file system for VO pool accounts (this is more or less a requirement to run MPI jobs). Currently, QWG templates support the use of NFS shared file systems. Configuration is done by the following variables : * `WN_SHARED_AREAS` : a nlist with one entry per file system that must be NFS mounted on worker nodes (key is the escaped file system mount point). Value for each entry is the name of the NFS server and optionaly the path on the NFS server if different from the path on the worker node. * `NFS_AUTOFS` : when true, use `autofs` to mount NFS file systems on NFS clients. This is the recommended setting, as this is the only one to avoid complex inter-dependency in startup order. But for backward compatibility, default value is false. ''__Note__ : variable WN_NFS_AREAS has been deprecated and replaced by WN_SHARED_AREAS. It the latter is not defined, WN_NFS_AREAS is used if defined.'' File systems listed in this variable are mounted on CE and WNs. The NFS server for the file systems can be any machine types and is not required to be managed by Quattor (but in this case, you probably need to force `CREATE_HOME` to `true` on one machine). If it is managed by Quattor, all actions required are done automatically. == LCG CE Configuration == QWG templates handle configuration of the LCG CE and the selected batch system (LRMS). To select the LRMS you want to use, you have to define variable `CE_BATCH_NAME`. '''There is no default'''. If you want to use Torque/MAUI, recommended version is `torque2`. The value of `CE_BATCH_NAME` must match a directory in `common` directory of gLite3 templates. ''Note : as of gLite 3.0.2, LRMS supported are Torque v1 (`torque1`) and Torque v2 (`torque2`), with MAUI scheduler.'' Previous versions of QWG templates used to require definition of `CE_BATCH_SYS`. This is deprecated : this variable is now computed from `CE_BATCH_NAME`. === PBS/Torque === PBS/Torque related templates support the following variables : * `CE_HOST` : name of the PBS/Torque master * `CE_PRIV_HOST` : alternate name of PBS/Torque server. Used in configuration where WNs are in a private network and PBS/Torque master has 2 network names/adresses. * `CE_QUEUES` : a nlist with one entry per queue (key is the queue name). For each queue, the value itself is a nlist. One mandatory key is `attr` and defines the queue parameters (`qmgr set queue` options). Another optional key is `vos` and is used to explicitly define the VOs which have access to the queue (by default, only the VO with the same name as the queue has access). Look at [source:templates/trunk/grid/lcg-2.7.0/site/pro_lcg2_config_site.tpl pro_lcg2_config_site.tpl] example for an example on how to define one queue for each supported VO. * `WN_ATTRS` : this variable is a nlist with one entry per worker node (key is the escaped node fullname). Each value is a set of PBS/Torque attribute to set on the node. Value value are any `key=value` supported by `qmgr set server` command. One useful value is `status=offline` to cause a specific node to drain or `status=online` to reenable the node. Just suppressing `status=offline` is not enough to reenable the node. One specific entry in `WN_ATTRS` is `DEFAULT` : this entry is applied to any node that doesn't have a specific entry. * `WN_CPUS_DEF` : default number of CPU per worker node. * `WN_CPUS` : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default. === MAUI === MAUI related templates support the following variables : * `MAUI_CFG` : the content of this variable must contain the full content of `maui.cfg` file. Look at [source:templates/trunk/grid/lcg-2.7.0/site/pro_lcg2_config_site_maui.tpl pro_lcg2_config_site_maui.tpl] example on how to define this variable from other configuration elements. * `MAUI_WN_PART_DEF` : default node partition to use with worker nodes * `MAUI_WN_PART` : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node. === CE Status === CE related templates use variable `CE_STATUS` to control CE state. Supported values are : * `Production` : this is the normal state. CE receives and processes jobs. * `Draining` : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them). * `Closed` : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete. * `Queuing` : CE accepts new jobs but will not execute them. `CE_STATUS` indicates the desired status of the CE. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is `Production`. This variable can be used in conjunction to [wiki:Doc/LCG2/TemplateLayout#PBSTorque WN_ATTRS] to drain queues and/or nodes. === Restarting LRMS Client === It is possible to force a restart of LRMS (batch system) client on all WNs by defining variable `LRMS_CLIENT_RESTART`. This variable, if present, must be a nlist with one entry per WN to restart (key is the WN name) or 'DEFAULT' for all WNS without a specific entry. When the value is changed (or first defined), this triggers a LRMS client restart. The value itself is not relevant but it is advised to use a timestamp for better tracking of forced restart. For example to force a restart on all WNs, you can add the following definition : {{{ variable LRMS_CLIENT_RESTART = nlist( 'DEFAULT', '2007-03-24:18:33', ); }}} A good place to define this variable is template `pro_site_cluster_info` in cluster `site` directory. '''Note : this feature is currently implemented only for Torque v2 client.''' === Run-Time Environment === gLite 3.0 templates introduce a new way to define `GlueHostApplicationSoftwareRunTimeEnvironment`. Previously it was necessary to define a list of all tags in the site configuration template. As most of these tags are standard tags attached to a release of the middleware, there is now a default list of tags defined in the default configuration site template, [source:templates/trunk/grid/glite-3.0.0/defaults/site.tpl defaults/site.tpl]. To supplement this list with tags specific to the site (e.g. `LCG_SC3`), define a variable `CE_RUNTIMEENV_SITE` instead of defining `CE_RUNTIMEENV` : {{{ variable CE_RUNTIMEENV_SITE = list("LCG_SC3"); }}} This change is backward compatible : if `CE_RUNTIMEENV` is defined in the site configuration template, this value will be used. === Working Area on Torque WNs === By default, QWG templates configure Torque client on WNs to define environment variable `TMPDIR` and location of `stdin`, `stdout` and `stderr` to a directory local to the worker node (`/var/spool/pbs/tmpdir`) and define environment variable `EDG_WL_SCRATCH` to `TMPDIR` (except for jobs requiring several WNs, e.g. MPI). This configuration is particularly adapted to shared home directories but works well with non shared home directories too. The main requirement is to appropriatly size `/var` on the WNs as jobs sometimes require an important scratch area. On the other hand, `/home` is not required to be very large, as it should not store very large file for a long period. It is strongly recommended to use shared home directories, served through NFS or another distributed file system, as it optimizes `/home` usage and allows to dedicate local disk space on WNs to `/var`. If your configuration cannot be set as recommended or if you current configuration has a large space in /home and a limited space in /var, you can define the following property in your WN profiles before including `machine-types/wn` : {{{ variable TORQUE_TMPDIR = /home/pbs/tmpdir"; }}} == SE Configuration == ''Note : This section covers the generic SE configuration, not a specific implementation.'' === List of site SEs === You can define a list of several SEs on your site, in variable (list) `SE_HOSTS`. There must be one corresponding entry for each SE host in variable `SE_TYPES`, a nlist where the key is the host name as defined in `SE_HOSTS`. === CE Close SEs === Variable `CE_CLOSE_SE_LIST` defines the SEs that must be registered in BDII as a close SE for the current CE. It can be either a value used for every VO or a nlist with a default value (key is `DEFAULT`) and one entry per VO with a different close SE (key is the VO name). Each value must be a string if there is only one close SE or a list of SEs. `CE_CLOSE_SE_LIST` defaults to deprecated `SE_HOST_DEFAULT` if defined or to the first SE in SE_HOSTS variaible. If you want all your SEs to be registered as close SEs, you need to add the following declaration : {{{ variable CE_CLOSE_SE_LIST = SE_HOSTS; }}} It is valid to have no close SE defined. To remove default definition, you need to do : {{{ variable CE_CLOSE_SE_LIST = nlist('DEFAULT', undef); }}} It is valid for the close SE to be outside your site but this is probably not recommended for standard configurations. === Default SE === Variable `CE_DEFAULT_SE` is used to define the default SE for the site. It can be either a SE name or a nlist with a default entry (key is `DEFAULT`) and one entry per VO with a different default SE (key is the VO name). By default, if not explicitly defined, it defaults to the first SE in CE_CLOSE_SE_LIST entries. The default SE can be outside your site (probably not recommended for standard configurations). == DPM Configuration == DPM related standard templates require a site template to describe the service site configuration. The variable `DPM_CONFIG_SITE` must contain the name of this template. This template defines the whole DPM configuration, including all disk servers used and is used to configure all the machines part of the DPM configuration. On DPM head node, variable `SEDPM_SRM_SERVER` must be defined to `true`. This variable is `false` by default (DPM disk servers). If you want to use Oracle version of DPM server define the following variable in your machine profile : {{{ variable DPM_SERVER_MYSQL = false; }}} === DPM site parameters === There is no default template provided for DPM configuration. To build your own template, you can look at template [source:templates/trunk/sites/example/site/pro_se_dpm_config.tpl pro_se_dpm_config.tpl] in examples provided with QWG templates. Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You '''MUST''' provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error : {{{ [pan-compile] *** wrong argument: operator + operand 1: not a property: element }}} If you want to use a specific VO list on your DPM server and you have several nodes in your DPM configuration (DPM head node + disk servers), you need to write a template defining `VOS` variable (with a non default value) and define variable `NODE_VO_CONFIG` to this template. === Using non standard port numbers === It is possible to use non standard port numbers for DPM daemons `dpm`, `dpns` and all SRM daemons. To do this, you only need to define the `XXX_PORT` variable corresponding to the service. Look at gLite [source:templates/trunk/grid/glite-3.0.0/defaults/glite.tpl default parameters] to find the exact name of the variable. === Script to publish dynamic information === As of DPM 1.5.10, the script used to publish dynamic information for DPM into BDII (space used/free per VO) has not been updated to interact properly with VOMS mapping. As a result, all VO specific pools are not counted into values published. QWG templates provide a fixed version of the script that can be installed by adding the following line to DPM head node profile : {{{ include glite/se_dpm/server/info_dynamic_voms; }}} To work properly this script requires `/opt/lcg/etc/DPMCONFIG` (or whatever file you defined for DPNS database connexion information) to be readable by world. This can be achieved by adding the following line to your DPM configuration in your site specific template : {{{ "/software/components/dpmlfc/options/dpm/db/configmode" = "644"; }}} == LFC Configuration == LFC related standard templates require a site template to describe the service site configuration. The variable `LFC_CONFIG_SITE` must contain the name of this template. If you want to use Oracle version of LFC server define the following variable in your machine profile : {{{ variable LFC_SERVER_MYSQL = false; }}} LFC templates allow a LFC server to act as a central LFC server (registered in BDII) for somes VOS and as a local LFC server for the others. This are 2 variables controlling what is registered in the BDII : * `LFC_CENTRAL_VOS` : list of VOs for which the LFC server must be registered in BDII as a central server. Default is an empty list. * `LFC_LOCAL_VOS` : list all VOs for which the server must be registered in BDII as a local server. Default to all supported VOs (`VOS`variable). If a VO is in both lists, it is removed from `LFC_LOCAL_VOS`. If you don't want this server to be registered as a local server for any VO, even if configured on this node (present in `VOS` list), you must define this variable as an empty list : {{{ variable LFC_LOCAL_VOS = list(); }}} VOs listed in both lists must be present in `VOS` variable. These 2 variables have no impact on GSI (security) configuration and don't control access to the server. If you want to have `VOS` variable (controlling access to the server) matching the list of VOs supported by the LFC server (either as central or local catalogues), you can add the following definition to your LCF server profile : {{{ variable VOS = merge(LFC_CENTRAL_VOS, LFC_LOCAL_VOS); }}} === LFC site parameters === Normally the only thing really required in this site specific template is the password for LFC user (by default `lfc`) and the DB accounts. Look at standard LFC [source:templates/trunk/glite-3.0.0/glite/lfc/config] configuration template for the syntax. Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You '''MUST''' provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error : {{{ [pan-compile] *** wrong argument: operator + operand 1: not a property: element }}} === Using non standard port numbers === It is possible to use non standard port numbers for LFC daemons. To do this, you only need to define the `XXX_PORT` variable corresponding to the service. Look at gLite [source:templates/trunk/grid/glite-3.0.0/defaults/glite.tpl default parameters] to find the exact name of the variable. == LCG RB Configuration == After the initial installation of the RB, it is necessary to manually initialize the MySQL database used by the RB using MyQL script provided by YAIM and then rerun NCM components for Quattor to complete the configuration, using the command : {{{ ncm-ncd --configure --all }}} == MPI Support == To activate MPI support on the CE and WNs, you need to define variable `ENABLE_MPI` to `true` in your site parameters (normally `pro_lcg2_config_site.tpl`). It is disabled by default.