wiki:Doc/gLite/TemplateCustomization

Version 187 (modified by /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Michel Jouvin, 13 years ago) (diff)

--

Customization of gLite Configuration

Site customization to QWG templates is done through a small set of templates used to define variables used as input by QWG templates. This doesn't cover OS basic configuration that is described in the page about template framework.

All site parameters related to QWG middleware are supposed to be declared in one template site/glite/config.tpl (or any other site-specific templates it may include). To start a new site, import the site parameter template example. The list of all available variables with their description and their default value can be consulted in template source:templates/trunk/grid/glite-3.1/defaults/glite.tpl. This template is a critical part of standard templates and should not be modified or duplicated.

Note : Information in this page may document features or configuration options not present in the current release. These information are related to changes and improvement that will be available in next release and are already present in the current development branch. If you urgently require these features, use content of this branch.

Documentation in this page is based on QWG templates for gLite 3.1. Most of the documentation also applies to deprecated QWG templates for gLite 3.0, except when explicitly stated or for features supported only by 3.1 series.

Note: this documentation often makes reference to a template called site/glite/config.tpl. This template used to be called pro_lcg2_config_site.tpl in the past. Both names are valid and taken into account by current templates, even though the namespaced name is the recommended one.

Service-Independent Configuration

This section contains information about how to tweak machine describe site configuration and how to build services shared by several node types, like VO configuration, LCAS/LCMAPS, Globus...

Machine types

QWG templates provide a template per machine type (CE, SE, RB, ...). They are located in machine-types directory and are intended to be generic templates. No modification should be needed.

To configure a specific machine with gLite middleware, you just need to include the appropriate machine type template into the machine profile, after specifying a template containing the specific configuration for this particular machine with the variable xxx_CONFIG_SITE (look in the template for the exact name of the variable).

Here is an example for configuring a Torque-based CE :

object template profile_grid10;

# Define specific configuration for a GRIF CE to be added to
# standard configuration
variable CE_CONFIG_SITE = "pro_ce_torque_grif";

# Configure as a CE (Torque) + Site's BDII
include machine-types/ce;

#
# software repositories (should be last)
#
include repository_common;

In this example, CE_CONFIG_SITE specify the name of a template defining the Torque configuration.

All the machine types share a common basic configuration, described in template machine-types/base.tpl. This template allows you to add site-specific configuration to this common base configuration (e.g. configuration of a monitoring agent). This is done by defining the variable GLITE_BASE_CONFIG_SITE to a template containing the site-specific configuration to be added to the common configuration (at the end of the common configuration). This variable can be defined, for example, in the template pro_site_cluster_info.tpl.

The following sections describe specific variables that can be used with each machine type. The machine type template to include is specified at the beginning of the section as Base template. In addition, to get more details, you can look at examples.

Creating a New Machine Type

All gLite machines types use a common base configuration, described in machine-types/base.tpl. This template is responsible in particular to do the base OS configuration, VO configuration and NFS configuration.

When creating a new machine type derived from this gLite base machine type, it is necessary, at the very end of the new machine type, to include the gLite update and postconfig templates, using the following PAN statement:

# gLite updates
include { 'update/config' };

# Do any final OS configuration needed
include { return(GLITE_OS_POSTCONFIG) };

Without gLite OS postconfig template, machine-types/base.tpl is not expected to compile succesfully.

Site Information

Every EGEE site must publish some general information about it, mainly:

  • SITE_NAME: the site name
  • SITE_LOC: site geographical location. Format must be "City, Country".
  • SITE_LAT: site latitude (number)
  • SITE_LONG: site longitude (number)
  • SITE_OTHER_INFO : must contain at least the ROC your site is attached to, specified as "EGEE-ROC=xx" with xx the ROC country code. In addition, for WLCG sites, it must define the WLCG role (tier) and the attached T1. See example of site parameters for more details.
  • SITE_EMAIL : sysadmins contact for the site
  • SITE_SECURITY_EMAIL : site email contact for security issues. Default to SITE_EMAIL.
  • SITE_USER_SUPPORT_EMAIL : site email contact for user support. Default to SITE_EMAIL.

See GOC wiki for more information on site informations.

VO Configuration

The list of VOs to configure on a specific node is defined in the variable VOS. Generally a site-wide default value is defined in site/glite/config.tpl (defined with operator ?=). This value can be overridden on a specific machine by defining VOS variable in the machine profile, before including the machine type profile.

An example VOS definition is :

variable VOS ?= list('alice',
                     'atlas',
                     'biomed',
                     'calice',
                     'cms',
                     'cppm',
                     'dteam',
                     'dzero',
                     'egeode',
                     'lhcb',
                     'ops',
                     'planck',
                     );

Note : dteam and ops are mandatory VOs for EGEE sites.

As an alternative to listing explicitly all the VOs supported on a node, it is possible to define variable VOS as the string ALL (instead of a list). In this case, all VOs with parameters available in the configuration (normally all the VOs registered in the CIC portal) are configured. This specific value should normally be restricted to UIs where there are no VO accounts created. Its main usage is to let a user on a UI act as a member of any VO they may be registered in. On a gsissh-enabled UI, it is advisable to restrict the VOs allowed to connect to the UI with gsissh to a limited number of VOs when VOS='ALL'. See the section on UI configuration for more details.

For each VO listed in VOS, there must be a template defining the VO parameters in vo/params or an entry in vo/site/aliases. The template name in vo/params must be the VO full name even though a VO alias name is used in VOS. If the VO to be added has no template to define its parameters, refer to the next section about adding a new VO.

Note: VO alias names are alternative names for VOs locally defined. Unlike VO names which are guaranteed to be unique, VO aliases may clash with another alias or full name. They must be used mainly to maintain backward compatibility in existing configurations where a name other than the VO full name was used. The use of VO alias is strongly discouraged for a new configuration or new VOs added to an existing configuration. For some specific purposes, it is possible to execute a site-specific template just before starting the VO configuration, after the site parameters have been read and the OS configuration has been done. Use variable NODE_VO_CONFIG to specify the name of the template.

VO accounts

Templates related to VO configuration handle everything related to VO configuration on a specific node, including creation of VO accounts (pool accounts, SW manager...). See below for the parameters related to account creation.

By default, the VO accounts are created locked to prevent their interactive use. There is one exception: if the variable GSISSH_SERVER_ENABLED equals true, these accounts are automatically unlocked. This happens mainly on UI and VOBOX.

Defining a VO alias name

VO names, now based on a DNS-like name, can be quite long and not very convenient to use in the configuration. This is possible to define a local alias for the VO name and use it in the site configuration in place of the VO name.

To define such an alias, a template aliases.tpl must exist in directory vo/site in your site or cluster directory. This template must define the variable VOS_ALIASES as a nlist where the key is the VO alias name and the value the actual VO name.

For example:

variable VOS_ALIASES ?= nlist(
  'agata',  'vo.agata.org',
  'apc',  'vo.apc.univ-paris7.fr',
  'astro',  'astro.vo.eu-egee.org',
  'lal',  'vo.lal.in2p3.fr',
);

Site-Specific Defaults for VO Parameters

It is possible to define site-specific defaults for VOs that override standard default. This must be done by defining entry DEFAULT in nlist variable VOS_SITE_PARAMS. This entry is used to define parameters that will apply to all VOs if they are not defined explicitly in VO parameters.

Each entry value must be the name of a structure template or a nlist defining any of these properties :

  • create_home : Create home directories for VO accounts. Default defined by variable CREATE_HOME variable.
  • create_keys : Create SSH keys for VO accounts. Default defined by variable CREATE_KEYS variable.
  • unlock_accounts : a regexp defining host names where the VO accounts must be unlocked
  • pool_digits : default number of digits to use when creating pool accounts
  • pool_offset : offset from VO base uid for the first pool account (normal users)
  • pool_start : index of the first account to create for a VO in its allocated VO range
  • pool_size : number of pool accounts to create by default for a VO (normal users)
  • fqan_pool_size : number of pool accounts to create for specific FQANs
  • sw_mgr_role : description of VO software manager role. Avoid to change default.
  • Location of standard services. See below.

Note: some properties are invalid in the context of the DEFAULT entry, in particular: account_prefix, base_uid, gid, name, voms_servers, voms_roles.

Overriding default VO Parameters

In addition to define default values for VO parameters, it is possible to override default VO parameters, as specified in templates located in vo/params, with site-specific values. This is possible to do it on a per-VO basis or for all VOs configured on a machine. This is done using the same variable (nlist) as for default parameters, VOS_SITE_PARAMS. To override default parameters for one specific VO, the key must be the VO name, as used in VOS variable. To override default parameters for all configured VOs, use special entry LOCAL.

Note: if a template vo/site/VONAME can be located, it'll be loaded even though there is no explicit entry for the VO into variable VOS_SITE_PARAMS.

The allowed properties are the same as for default parameters.

Note: some properties are invalid in the context of the LOCAL entry (as with DEFAULT), in particular: account_prefix, base_uid, gid, name, voms_servers, voms_roles.

For example, to define a site-specific WMS for VO Alice, create a template vo/site/alice.tpl in your site directory like :

structure template vo/site/alice;

'wms_hosts' = 'wms.example.org';

Alternatively, you can define these parameters directly into VOS_SITE_PARAMS :

variable VOS_SITE_PARAMS = nlist ('alice', nlist('wms_hosts' , 'wms.example.org',
                                                ),
                                 );

Site-specific parameters for VOMS role accounts

VOs often define roles in VOMS for specific purposes. For example, the ATLAS VO defines the role production which can only be used by users allowed to run production jobs. The roles defined for a VO are automatically retrieved by the update.vo.config and task. By default, a single account with an arbitrary suffix is automatically generated for each role found. For example, the following is an extract of the accounts generated for roles in the ATLAS VO:

"voms_roles" ?= list(
     nlist("description", "SW manager",
       "fqan", "/atlas/Role=lcgadmin",
       "suffix", "s"),
     nlist("description", "production",
       "fqan", "/atlas/Role=production",
       "suffix", "p"),
     nlist("description", "pilot",
       "fqan", "/atlas/Role=pilot",
       "suffix", "hs"),
...

A particular site may wish to define its own parameters for a particular VOMS role. This can be done with nlist variable VOMS_ROLE_CONFIG_SITE. In this variable the key is a VO name and the value a nlist where the key is the role. The value of this second nlist has the same format as VOS_SITE_PARAMS.

In this example, the Atlas role production is configured to use pool accounts:

variable
  VOMS_ROLE_CONFIG_SITE =
   nlist("atlas",                            # VO
     nlist(escape("/atlas/Role=production"),             # role FQAN
        nlist("pool_size", 20,
              "suffix", "prd") ));

To use pool accounts with all the specific FQANs declared in VO parameters, using the same number of accounts in the pool for each FQAN, it is possible to define propery fqan_pool_size in the VO-specific entry or in the DEFAULT entry of VOS_SITE_PARAMS variable. For example, to use pool accounts for each specific FQAN of each VO, creating 10 accounts per FQAN, except for Atlas where 20 accounts per FQAN are created:

variable VOS_SITE_PARAMS ?= nlist(
  'DEFAULT', nlist('fqan_pool_size', 10),
  'atlas',   nlist('fqan_pool_size', 20),
);

Adding a New VO

Note: the procedure to create a new VO definition here is for very specific cases. The normal procedure is to register it properly on CIC Portal and generate the configuration information from the portal with ant update.vo.config (when using SCDB).

Adding a new VO involves the creation of a template defining VO parameters. This template name must be the name you use to refer to the VO in rest of the configuration but is not required to be the real VO name (can be an alias used in the configuration). This template must be located in directory vo/params, in one of your cluster- or site-specific hierarchy of templates or in gLite templates.

Note : if you create a template for a new VO, be sure to commit it to the QWG repository if you have write access to it, or to send it to the QWG developers. There is normally no reason for a VO definition not to be generally available.

To create a template to describe a new VO, the easiest is to copy the template for an already configured VO. The main variables supported in this template are :

  • name : VO official name. No default.
  • account_prefix : prefix to use when creating accounts for the VO. Generally the 3 first letters of the VO name. No default.
  • voms_servers : a nlist describing VOMS server used by the VO, if any. If the VO has several (redundant) VOMS servers, this property can be a list of nlist. For each VOMS server, supported properties are :
    • name : name of the VOMS server. This is a name used internally by template. By default, template defining VOMS server certificate has the same name. No default.
    • host : VOMS server host name. No default.
    • port : VOMS server port associated with this VO. No default.
    • cert : template name, in vo/certs , defining VOMS server certificate. If not specified, defaults to the VOMS server name.
  • voms_mappings (replace deprecated voms_roles) : list of VOMS groups/roles supported by the VO. This property is optional. This is a nlist with one entry per mapping (mapped accounts). The supported properties for each entriy are :
    • description : description of the mapping. This property is informational, except for VO software manager where it must be SW manager (with this exact casing).
    • pattern (replace deprecated name) : VO group/role combinations mapped to this account. This can be a string or a list of string (if several group/role combinations are mapped to the same account). Each value can be either a role name (without /ROLE=) or a group/role combination in standard format /group1/group2/.../ROLE=rolename. Note that and /ROLE keywords are required to be upper case, that there may be several groups but only one role and if both are present, role must be the last one. Look at LHCb VO parameters for an example.
    • suffix : suffix to append to account_prefix to build account name associated with this role.
  • base_uid : first uid to use for the VO.
  • create_home : Create home directories for VO accounts. Default defined by variable CREATE_HOME variable.
  • create_keys : Create SSH keys for VO accounts. Default defined by variable CREATE_KEYS variable.
  • gid : GID associated with VO accounts. Default : first pool account UID.
  • pool_size : number of pool accounts to create for the VO. Defaults : 200.
  • pool_digits : number of digits to use for pool accounts. Must be large enough to handle pool_size. Default is 3.
  • pool_offset : define offset from VO base uid for the first pool account
  • Location of standard services. See below.

In addition to this template, you need to have another template defining the public key of the VOMS server used by the VO. This template has the name of the VOMS server by default. It can be explicitly defined with certproperty of a VOMS server entry. If the new VO is using an already used VOMS server, there is no need to add the certificate.

Default Services for a VO

Location of standard services to use with a specific VO can be defined either in the VO parameters or in the site-specific parameters for a VO. Services that can be configured are :

  • proxy : name of the proxy server used by the VO. No default, optional.
  • rb_hosts : LCG RB host name to use by default. Service ports will be set to default values. Can be a list or a single value.
  • wms_hosts : gLite WMS host name to use by default. Service ports will be set to default values. Can be a list or a single value.
  • catalog : define catalog type used by the VO. Optional. Must be defined only for VO still using RLS (value must be rls or RLS).

In addition to variables above, it is possible to use the following variables if you need more control over service location or endpoints :

  • nshosts : name:port of the RB used by the VO (Network Server). No default.
  • lbhosts : name:port of the RB used by the VO (Logging and Bookeeping). No default.
  • wms_nshosts : name:port of the WMS used by the VO (Network Server). Can be a list or a single value. No default.
  • wms_lbhosts : name:port of the WMS used by the VO (Logging and Bookeeping). Can be a list or a single value. No default.
  • wms_proxies : endpoint URI of WMProxy used by the VO. Can be a list or a single value. No default.

VO-Specific Areas

There are a couple of variables available to customize VO-specific areas (software area, VO accounts home directories...) :

  • VO_SW_AREAS : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory to use for the VO software area. Be sure to list this directory or its parent in WN_SHARED_AREAS if you want to use a shared filesystem for this area (this is highly recommended). Directories listed in this variable will be created with the appropriate permissions (0755 for VO group). In addition to per VO entries, entry DEFAULT may be used to create one SW area for each configured VO on the current node : in this case the value is the parent directory for SW areas and the per VO directory name is the VO name (default) or the SW manager userid if variable VO_SW_AREAS_USE_SWMGR is defined to true.
  • VO_SW_AREAS_USE_SWMGR : when set to true, VO SW manager userid is used as a directory name for the SW area for VOs without an explicit entry in VO_SW_AREAS.
  • VO_HOMES : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory prefix to use when creating home directories for accounts. A suffix will be added to this name corresponding to the VO role suffix for role accounts or the the account number for pool accounts. By default, VO accounts are created in /home. 2 keywords allow to create a subdirectory per VO under the directory parent to avoid too many entries at the same level. Look at documentation about LCG CE for more information.
  • VO_SWMGR_HOMES : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory to use as the home directory for the VO software manager. If there is not entry for a VO, VO_HOMES is used. Main purpose of this variable is to define home directory for the software manager as the VO software area. This can be achieved easily by assigning VO_SW_AREAS to this variable.
  • CREATE_HOME : this variable controls creation of VO accounts home directories. It accepts 3 values : true, false and undef. undef is a conditional true : home directories are not created if they reside on a NFS shared file system (it is listed in WN_SHARED_AREAS) and the NFS server is not the current machine.

Tuning VO configuration on a specific node

Each machine type templates define VO configuration (pool accounts, gridmap file/dir...) appropriate to the machine type. If you want to change this configuration, on a specific node, you can use the following variables :

  • NODE_VO_ACCOUNTS (boolean) : VO accounts must be created for each VO initialized. Default : true.
  • NODE_VO_GRIDMAPDIR_CONFIG (boolean) : gridmapdir entries must be initialized for pool accounts. Default : false.
  • NODE_VO_WLCONFIG (boolean) : initialize workload management environment for each VO. Normally enabled only on resource brokers. Default : false.
  • NODE_VO_CREATEHOME (boolean) : create home directories for pool accounts. Default : true.

In addition you can execute actions specific to the local machine by defining the following variable (mainly used to define a VO list specific to a node by assigning a non default value to VOS variable) :

  • NODE_VO_CONFIG (string) : site-specific template that must be included before actually doing VO intialization. Allow for specific VO modification to default VO configuration. Default : none.

Note : before modifying default VO configuration for a specific machine, be sure what you want to do is valid. Misconfiguring VO can have dramatic effects on service availability.

Mapping of VOMS groups/roles into grid-mapfile

grid-mapfile is used as a source of mapping information between users DN and Unix accounts when this cannot be obtained from VOMS.

Default behaviour for describing user mapping in grid-mapfile used to be mapping users with a specific role to the account corresponding to this role. Unfortunately, the result is unpredictable if a user has several roles in the VO. The default in QWG templates, starting with release gLite-3.0.2-12, is to always map users to normal users in grid-mapfile. To obtain a mapping based on a specific role, users have to get a proxy with the required VOMS extensions using voms-proxy-init --voms.

2 variables allow to modify this default behaviour for generating grid-mapfile:

  • VO_GRIDMAPFILE_MAP_VOMS_ROLES: when set to true, a grid-mapfile entry is added for each valid VO FQANs in addition to the VO members.
  • VO_VOMS_FQAN_FILTER: this nlist allows to define on a per-VO basis what are the VOMS FQANs to add to the grid-mapfile. The key is a VO name or DEFAULT for the default entry. Default entry if present is applied to all VOs without an explicit entry. If there is no entry for a VO and there is no default entry defined, all VO users and valid FQANs are added to the grid-mapfile. This variable is ignored if VO_GRIDMAPFILE_MAP_VOMS_ROLES is not true. The entry value must be either a FQAN declared in VO parameters (without the initial /voname), a VOMS mapping description as declared in the VO parameters or undef to allow all users and valid FQAN. '/' is interpreted as all normal users (without a specific group or role).

These 2 variables are mainly used on VO boxes where they should be defined with appropriate values by the standard configuration.

Allocation of Service Accounts

Some services allow to define a specific account to be used to run the service. In this case, there is one template for each of these accounts in common/users. The name of the template generally matches the user account created or, when the template is empty, the name of the service.

A site can redefine account names or characteristics (uid, home directory...). To do this, you should not edit directly the standard templates as the changes will be lost in the next version of the template (or you will have to redo them by hand). You should create a users directory somewhere in your site or cluster hierarchy (e.g. under the site directory, not directly at the same level else it will not work without adjusting cluster.build.properties) and put your customized version of the template here.

Note : don't change the name of the template, even if you change the name of the account used (else you'll need to modify standard templates requiring this user).

Accepted CAs

There is one template defining all the accepted CAs. This template is produced by people in charge of producing new releases of the list of CAs officially accepted by EGEE. If you need to adjust it, create a site or cluster-specific copy of common/security/cas.tpl in a directory common/security.

If you need to update this template, refer to the standard procedure? to do it.

Globus

Globus is used by most of the gLite services. Some variables allow to configure Globus parameters, in particular Globus ephemeral port ranges.

  • GLOBUS_TCP_PORT_RANGE_MIN: lower port in TCP ephemeral port range. Default: 20000.
  • GLOBUS_TCP_PORT_RANGE_MAX: upper port in TCP ephemeral port range. Must be greater or equal to lower port. Default: 25000.
  • GLOBUS_UDP_PORT_RANGE_MIN: lower port in UDP ephemeral port range. Default: none.
  • GLOBUS_UDP_PORT_RANGE_MAX: upper port in UDP ephemeral port range. Must be greater or equal to lower. Default: none.

LCAS / LCMAPS

LCAS and LCMAPS are 2 underlying services, generally used together, by most grid services to manage authorization and user mapping. LCAS is responsible for managing authorization based on configured policies (banned users, timeslots permitted...) and LCMAPS is responsible for mapping a grid DN to a Unix user account.

LCMAPS configuration is based on VO configured and on VOMS group/role mapping rules.

LCAS can be configured with the following variables to restrict access to a grid resource like a CE:

  • LCAS_BANNED_USERS: list of user DNs forbidden access to the resource. By default, this list is empty (it as a template DN which will never match a real user).
  • LCAS_TIMESLOT_ENTRIES: a list of timeslot specification specifying when the resource is opened to grid access. See LCAS documentation for more information on the format. By default, there is no restriction.

Shared gridmapdir

QWG templates support configuration of a shared gridmapdir between different machines. This is typically used when several CEs share the same WNs to ensure a consistent mapping of DNs to userids through all CEs. The QWG implementation is not restricted to CEs, even though it doesn't really make sense for other services.

If several machines have to share the same gridmapdir, the variable GRIDMAPDIR_SHARED_PATH must be defined in their profile. This variable is undefined by default. When defined it must refer to an existing path on the machine that will use it or the gridmapdir will not be configured as shared.

Even though it is not mandatory, gridmapdir is generally shared using NFS. To enable NFS-sharing of the gridmapdir, you must define variable GRIDMAPDIR_SHARED_SERVER to the host name serving the gridmapdir. It doesn't need to be one of the machine using it (for example it can be a dedicated NFS server). If the server is managed with Quattor, Quattor will ensure that the NFS is properly configured to export the reference gridmapdir (as specified by SITE_DEF_GRIDMAPDIR on this machine) as GRIDMAPDIR_SHARED_PATH. On the "clients" (the other machines using the shared gridmapdir), NFS will be configured to mount the shared gridmapdir and SITE_DEF_GRIDMAPDIR will be redefined as a link to this mount point.

2 other variables allow to customize gridmapdir sharing according to your needs:

  • GRIDMAPDIR_SHARED_PROTOCOL: if anything different from nfs, QWG templates will not configure NFS for sharing the gridmapdir. The sharing must be done by other means in such a way that GRIDMAPDIR_SHARED_PATH is available when gridmapdir is configured on the client machine. Default: nfs.
  • GRIDMAPDIR_SHARED_CLIENTS: a list of machines sharing the gridmapdir. Default: CE_HOSTS variable (all the CEs sharing the same WNs).

Shared File Systems

It is recommended to use a shared file system mounted (at least) on CE and WNs for VO software areas. It is also sometimes convenient to use a shared file system for VO pool accounts (this is more or less a requirement to run MPI jobs). Currently, QWG templates support the use of NFS or non-NFS shared file systems but only the NFS service is configured by the templates. For other distributed file system (AFS, LUSTRE, GPFS...), you must add the necessary configuration to the site-specific configuration.

Configuration is done by the following variables :

  • WN_SHARED_AREAS : a nlist with one entry per file system which is shared between worker nodes and CE (key is the escaped file system mount point). See below the format of the entries for NFS-served file systems. For other distributed file systems providing a global namespace (like AFS, LUSTRE, GPFS), the entry value must be undef. It is important to add an entry in this list for each shared file system, even though not NFS served, as some parts of the configuration (eg. Torque configuration) use this information to distinguish between local and shared file systems.
  • NFS_AUTOFS : when true, use autofs to mount NFS file systems on NFS clients. This is the recommended setting, as this is the only one to avoid complex inter-dependency in startup order. But for backward compatibility, default value is false.

Note : variable WN_NFS_AREAS has been deprecated and replaced by WN_SHARED_AREAS. It the latter is not defined, WN_NFS_AREAS is used if defined.

Note : non shared filesystem for home directories is supported only with Globus job manager lcgpbs.

NFS server is configured on any machine (whatever its machine type) managed by Quattor and listed as the NFS server for one of the entries in WN_SHARED_AREAS. All actions required are done automatically. If the NFS server listed is not managed by Quattor, it is necessary to force CREATE_HOME to true on one machine.

NFS client can be potentially configured on any machine type but by default this is done only on CE and WNs. To configure the client on other machine types, define variable NFS_SERVER_ENABLED to one the following values:

  • undef: configure NFS client if needed according to the configuration (WN_SHARED_AREAS contents).
  • true: force configuration of NFS client even though there is no NFS file system to mount on the machine.

Specifying server of a NFS file system

In variable WN_SHARED_AREAS, the value of each entry specified the NFS server for the file system and optionally the file system mount point on the server if it is different than the one used on the clients. The general format for the value is a URL like:

nfs|nfs3|nfs4://server[/mount/point]

When the protocol specified is nfs (without an explicit version), server will be configured with both versions and client, unless an explicit version is request (see next section), will be configured with v3.

The legacy format:

server[:/mount/point]

is still supported and is equivalent to:

nfs3://server[/mount/point]

Selecting NFS version to use on the client

For NFS, both v3 and v4 are supported. When the version is specified in the protocol token of the server URL (see previous section), this version is used both on the server and on the clients. Otherwise both versions are configured on the server and the version configured on the client depends on the following variables:

  • NFS_CLIENT_VERSION: a nlist with one entry per node whose key is the escaped client host name and the value is a string ('3' or '4'). If the client configured has an entry in this variable, the specified NFS version is used.
  • NFS_CLIENT_DEFAULT_VERSION: a nlist where entry keys are either an escaped regexp matched against the node being configured or 'DEFAULT'. If host name of the client being configured is matched by one of the regexp, the specified value is used. Else if DEFAULT entry is present it is used.
  • If no match was found with the previous variables, v3 is used.

Suppose you want to configure v4 on all your grid nodes and only on these nodes and that their host names always start with prefix grid and belonging to domain example.org, you can use the following definition:

variable NFS_CLIENT_DEFAULT_VERSION = nlist(
  'DEFAULT',     '3',
  '^grid.*\.example\.org$',   '4',
);

Specifying NFS options

There are two variables to define mount options to be used with NFS file systems :

  • NFS_DEFAULT_MOUNT_OPTIONS : defines mount options to be used by default, if none are explicitly defined for a filesystem.
  • NFS_MOUNT_OPTS : defines mount options to be used for a specific file system. This variable is a nlist with one entry per file system : key must be the escaped path of the mount point.

Defining NFS exports

NFS exports can be defined using a set of variables. By default only CE and worker nodes are given access to NFS server. This variables can be redefined either in NFS server profile, in the cluster the NFS server belongs to or in the gLite site parameters used by NFS server.

Note : the following variables don't configure filesystem mounting. For this see Configuring shared filesystems.

Variables available to customize the NFS export ACL are :

  • NFS_CE_HOSTS : list of CE hosts requiring access to NFS server (default is CE_HOST)
  • NFS_WN_HOSTS : list of WN hosts requiring access to NFS server (default is WN_HOSTS)
  • NFS_LOCAL_CLIENTS : list of other local hosts requiring access to NFS server

These variables can be a string, a list or a nlist. A string value is interpreted as a list with one element. When specified as a list or string, the value must be a regexp matching name of nodes that must be given access to NFS server. In this case, the access rights (export options) is the string specified in variable NFS_DEFAULT_RIGHTS. When specified as a nlist, the key must be an escaped regexp matching node names (in exports format (only * and ? wilcards permitted) and the value is the export options between ().

Note : when possible, this is recommended to replace default value for NFS_WN_HOSTS (list of all WNs) by one or several regexps matching WN names to reduce the number of hosts on the export line.

NFS_DEFAULT_RIGHTS is a nlist which must contain a DEFAULT entry used for any file system without an explicit entry and optionally one entry per file system (key is the escaped file system path) when defaults are not appropriate. If not defined, default is rw with root squashing enabled for all file systems (DEFAULT entry), except /home where root squashing is disabled.

Antoher variable, NFS_CLIENT_HOSTS, allows to define the clients allowed to access the file system on a per file system basis. There is a default entry (DEFAULT) used for any file system without an explicit entry. The default value for default entry is all the hosts specified by NFS_CE_HOSTS, NFS_WN_HOSTS and NFS_LOCAL_CLIENTS. Keys specifying file systems must be the escaped file system mount point. Host list of allowed clients may be specified using regexps in export format.

Note: currently NFS_CLIENT_HOSTS is used to build the list of hosts in exports file but has no impact on the mounting of file systems on clients.

Relocation of Home Directories of VO Accounts

When using a NFS-served file system for home directories, the traditional approach to mount it under '/home' has several drawbacks. In particular, all service accounts also have NFS-based home directories and this may impact all services when NFS becomes unavailable or irresponsive. On the other hand, this is desirable to keep a unified configuration shared by machines sharing the NFS file systems and the other machines (e.g. WMS, VOBOX... all machines with VO accounts).

With QWG templates this is easily done by defining variable VO_HOMES_NFS_ROOT to the directory parent to use for home directories on a machine with NFS client configured, when parent described in variable VO_HOMES is /home. The directory pointed by VO_HOMES_NFS_ROOT must correspond to an entry (or children of an entry) in WN_SHARED_AREAS. Look at site parameter example for more details.

When modifying an existing configuration, a careful planning is needed. This cannot be done on the fly. To avoid a long reconfiguration of ncm-accounts, this generally involves:

  • On the NFS server, remounting the file system containing home directies on the new mount point
  • Delete accounts using /home (except ident) from /etc/password. This can be done with a script deployed and executed with ncm-filecopy
  • Remove symlink /home if any (autofs configuration) and create a directory /home
  • Update your site parameters and deploy the changes, defining ncm-useraccess as a post-dependency for ncm-accounts if it is used in the configuration. This will ensure that during deployment all accounts are recreated and the ssh, Kerberos... configuration for the user is done.

NFS Server

Base template : machine-types/nfs.

When using this template, it is possible to configure a machine as a dedicated NFS server whose configuration is shared with grid machines for file system configuration and accounts. But in QWG templates, any gLite machine type will be configured as a NFS server as soon as the machine is listed as the NFS server for one of the file systems in WN_SHARED_AREAS.

Monitoring

The variable MONITORING_CONFIG_SITE, which defaults to 'site/monitoring/config', can be used to specify the monitoring tools template that will be included.

RPMs Repositories

repository/config/glite.tpl describes the RPM repositories used to locate RPMs required by gLite templates. Default RPM repository configuration in QWG Templates requires 5 RPMs repositories plus an optional one for each gLite version. Name given here are the default ones.

  • glite_repos_prefix : gLite RPMs shipped with gLite.
  • glite_repos_prefix_externals : RPMs required by gLite and shipped with it but developed and maintained outside gLite.
  • glite_repos_prefix_updates : official updates to gLite base RPMs, as provided by gLite releases.
  • glite_repos_prefix_unofficial (optional) : unofficial updates to gLite base RPMs used at the site. Normally empty.
  • mpi : RPMs related to MPI.
  • ca : CA RPMs as distributed by Grid PMA.

glite_repos_prefix can be customized without editing the standard template, defining REPOSITORY_GLITE_PREFIX variable. If not explicitly defined, it defaults to glite_3_0_0 for gLite 3.0 and glite_3_1 for gLite 3.1.

All required repositories must have an associated template whose name is the same as the repository, in site- or cluster-specific templates. Optional repository is ignored if its associated template is not present. Each template describe the content of the repositories. When using SCDB?, these templates are maintained with command ant update.rep.templates.

Note : it is not required to use this structure and you can edit this template to match your local conventions, if different. When upgrading QWG templates, be sure to revert changes to this template.

A template version of these RPM repositories is distributed as part of examples (templates/trunk/sites/example/repository). They can be used to compile examples but for deployment of a real configuration, you need to build your own version of these templates. You can create an initial version of these repositories by downloading RPMs from the URL mentioned at top the template examples with wget or src/utils/misc/rpmUpdates.pl. Then update the URL at the top of the template examples to match your local repositories.

Service-specific Configuration

CE Configuration

Base template : machine-types/ce.

QWG templates can handle configuration of the LCG (gLite 3.1 only) or the CREAM CE and its associated batch system (LRMS). Most of the configuration description is common to both type of CE. In gLite 3.1, CE type defaults to LCG for backward compatibility whereas in gLite 3.2 it defaults to CREAM, the only CE availabe. CE type selection is done with variable CE_TYPE which must be lcg or cream. This variable is ignore in gLite 3.2.

LRMS selection is done with variable CE_BATCH_NAME. There is no default. The supported LRMS and associated values are:

  • Torque/MAUI: torque2
  • Condor: condor

Note: the value of CE_BATCH_NAME must match a directory in common directory of gLite templates.

Note: previous versions of QWG templates used to require definition of CE_BATCH_SYS. This is deprecated : this variable is now computed from CE_BATCH_NAME.

Site-specific gLite parameters must declare the host name of the CEs that share the same worker nodes. All the CEs declared in one set of gLite parameters (one gLite parameter template) will share the same WNs. To configure several CEs with distinct worker nodes, you must create separate clusters. Host name of the CEs can be declared with one of the following two variables:

  • CE_HOSTS: a list of host names corresponding to the different CEs sharing the same WNs.
  • CE_HOST: for backward compatibility, when there is only one CE, this variable can be defined to its name, instead of using CE_HOSTS.

In addition, 2 other variables independent of the LRMS are available:

  • CE_PRIV_HOST: alternate name of CE host. Used in configuration where WNs are in a private network and CE has 2 network names/adresses. This variable is not (yet) supported with multiple CEs.
  • CE_WN_ARCH: OS architecture on CE worker nodes. Due to limitation in the way this information is published now, this is a CE-wide value. If you have both 64-bit and 32-bit WNs, you must publish 32-bit (i386). Default value is based on CE architecture.

Sharing WNs between several CEs

QWG templates allow to configure several CEs sharing the same WNs. They must share the same gLite parameters and the variable CE_HOSTS must contain all the CE host names. They can be LCG CE and/or CREAM CE. If you want to mix LCG and CREAM CE, it is recommended to maintain a separate list of for each CE type and build CE_HOSTS by merging them as in the following example:

variable CE_HOSTS_CREAM ?= list('cream1.example.org','cream2.example.org');
variable CE_HOSTS_LCG ?= list('lcg1.example.org','lcg2.example.org');
variable CE_HOSTS ?= merge(CE_HOSTS_LCG,CE_HOSTS_CREAM);

In addition, when using several CEs with the same WNs, it is necessary to configure a shared gridmapdir. This is required to ensure consistency of DN/userid mapping across CEs.

CREAM CE Specific Configuration

CREAM CE has some unique features and requirements, not available in LCG CE, that can be easily customized with QWG templates. To identify CREAM CEs among all defined CEs, they must belong to the list CE_HOSTS_CREAM, as suggested above.

CREAM CE uses internally a MySQL database. The database connexion can be configured with the following variables:

  • CREAM_MYSQL_ADMINUSER (optional): MySQL user with administrative privileges. Default: root.
  • CREAM_MYSQL_ADMINPWD (required): password of MySQL administrative account. No default.
  • CREAM_DB_USER (optional): MySQL user used by CREAM CE components. Default: creamdba.
  • CREAM_DB_PASSWORD (required): password of MySQL user used by CREAM CE components. No default.
  • CREAM_MYSQL_SERVER (optional): host name running the MySQL server used by the CE. Default: CE host name.

In particular, CREAM CE has a WMS-like management of user input and output sandbox : they are all stored in a dedicated area, outside user home directory. In a configuration where home directories are shared through NFS (or another distributed file system), this requires an additional to share this sandbox area too. It is also possible to share the sandbox area between the CE and the WNs, even though the home directories are not. Variables related to sandbox management are:

  • CREAM_SANDBOX_MPOINTS: a nlist defining the CE whose sandbox area must be shared. Only the CE with an entry in this nlist will have their sandbox area shared with WN. The key is the CE host name and the value is the mount point to use on the WN. There is no need for the mount point on the WN to be the same as on the CE. There is no default for this variable.
  • CREAM_SANDBOX_DIRS: a nlist defining where the sandbox area is located on each CE. There may be one entry per CE and one default entry (key=DEFAULT). If no entry apply to a CE, the standard default, /var/cream_sandbox, is used.
  • CREAM_SANDBOX_SHARED_FS: a nlist defining the protocol to use for sharing sandbox area. There may be one entry per CE and one default entry (key=DEFAULT). If undefined, nfs is assumed. If defined but no entry apply to the current CE (and there is no default entry), assume something other than NFS.

When NFS is used to share sandbox area, the [Shared File Systems usual NFS variables] apply to define NFS version to use, mount options...

Note: sandbox area sharing is configured independently of other file systems specified in WN_SHARED_AREAS. Sandbox areas are normally not specified in WN_SHARED_AREAS but if they are, this takes predence over the specific configuration done with CREAM_SANDBOX_MPOINTS.

When using sandbox sharing with several CEs (specified in the same CE_HOSTS variable), it is important to define a distinct mount point for each CE. Below is an example showing how to define CREAM_SANDBOX_MPOINTS based on CE_HOSTS_CREAM:

variable CREAM_SANDBOX_MPOINTS ?= {
  foreach (i;ce;CE_HOSTS_CREAM) {
    SELF[ce] = '/cream_sandbox/'+ce;
  };
  SELF;
};

A few other variables specific to CREAM CE are available, in particular to define log locations:

  • CREAM_LOG_DIR: location of CREAM CE log. Default: /var/log/glite.
  • BLPARSER_LOG_DIR: location of BLParser log file. Default: /var/log/glite.
  • GLEXEC_LOG_DESTINATION: must be syslog or file. Default is syslog for CREAM 1.5 and file for later versions.
  • GLEXEC_LOG_DIR; location of glexec log files. This must be different from the 2 other log locations because the permissions are not compatible (none belong to root). It is ignored if GLEXEC_LOG_DESTINATION is set to syslog. Default: /var/log/glexec.
  • CEMON_ENABLED: if true, CEMonitor is configured and started. Default: false in CREAM 1.6 and later. Note: CEMonitor is not used by any standard gLite components/services.

CREAM CE relies on BLParser to interact with the batch system and get status back about submitted jobs. The BLParser must run on a machine with access to the batch system logs. The default is to run it on the LRMS master, which can be defined explicitly with variable LRMS_SERVER_HOST and defaults to the first CE in CE_HOSTS. For specific needs, it is possible to define explicitly the BLParser host with variable BLPARSER_HOST.

CREAM CE implements a job purger to clean database entries and sandboxes related to completed jobs (aborted, canceled or done). Default configuration should be appropriate but for specific needs, the following variables can be used to customize the job purger policy:

  • CREAM_JOB_PURGE_RATE: interval between 2 runs of the purger in minutes. Default: 720.
  • CREAM_JOB_PURGE_POLICY_ABORTED: for jobs in ABORTED state, job age in days before purging it. Default: 10.
  • CREAM_JOB_PURGE_POLICY_CANCELED: for jobs in CANCELED state, job age in days before purging it. Default: 10.
  • CREAM_JOB_PURGE_POLICY_DONEOK: for jobs in DONE-OK state, job age in days before purging it. Default: 15.
  • CREAM_JOB_PURGE_POLICY_DONEFAILED: for jobs in DONE-FAILED state, job age in days before purging it. Default: 10.
  • CREAM_JOB_PURGE_POLICY_REGISTERED: for jobs in REGISTERED state, job age in days before purging it. Default: 2.

For more information on CREAM CE configuration and troubleshooting, refer to the CREAM CE official web site.

Most of these variables are usually defined in gLite parameters. Look at the example changes to illustrate modifications typically required to an existing gLite parameter template to support CREAM CEs.

If experiencing difficulties during the initial installation, be sure to read release notes.

Home Directories of VO Accounts

QWG templates support both shared and non shared home directories for VO accounts. See section on NFS Server for more information on how to configure shared home directories. Shared home directories are the recommended configuration and are required to support MPI.

Independently of the shared/non shared configuration, the following variable is used to configure home directories for VO accounts:

  • VO_HOMES: a nlist defining parent of home directories for all the VO accounts. For each entry, the key is the VO name as defined in variable VOS (it may be a VO alias name) and the value is the parent directory for the corresponding accounts (pool accounts and other accounts associated with roles). A special entry, DEFAULT may be used to define home directory parent for all the VOs without an explicit entry.

When supporting multiple VOs, the number of accounts can be very large (several thousands). This may lead to performance problems if they all share a common parent. In the value defining the parent directory, it is possible to use the following keywords to create a per-VO parent under a common root (in a common file system):

  • @VONAME@ : will be expanded to the VO full name
  • @VOALIAS@ : will be expanded to the VO alias name locally defined. When possible, it is better to use the VO full name which is unique and will not change.

For example, the following variable create one directory per VO under /home and accounts for each VO will be created in the VO-specific directory:

variable VO_HOMES ?= nlist(
  'DEFAULT',       '/home/@VONAME@',
);

When modifying an existing configuration, a careful planning is needed. This cannot be done on the fly. To avoid a long reconfiguration of ncm-accounts, this generally involves:

  • On the NFS server, move existing home directories to the appropriate location
  • Delete accounts using /home (except ident) from /etc/password. This can be done with a script deployed and executed with ncm-filecopy
  • Update your site parameters and deploy the changes, defining ncm-useraccess as a post-dependency for ncm-accounts if it is used in the configuration. This will ensure that during deployment all accounts are recreated and the ssh, Kerberos... configuration for the user is done.

Defining Queues

Definition of queues is done independently of the LRMS used. The following variables are used to define queues:

  • CE_QUEUES_SITE : a nlist defining for each queue the list of VOs allowed to access the queue and optionally the specific attributes of the queue. Access list for queue is defined under vos key, attributes under attlist key. The value for each key is a nlist where the key is the queue name. For access list, the value is a list of VO allowed or denied access to the queue (to deny access, prefix VO name with a -). For queue attributes, the value is a nlist where the key is a Torque attribute and the value the attribute value. By default one queue is created for each VO. Look at example for more information on how to customize default configuration. To undefine a standard queue, define its attlist to undef.
  • CE_LOCAL_QUEUES : a list of Torque queue to define that will not be available for grid usage (accessible only with standard Torque commands). This list has a format very similar to CE_QUEUES, except that key containing queue name is called names instead of vos and that its value is useless.

Note: in previous version of the templates, customization of queue list was done by defining CE_QUEUES variable in site parameters. In this case the creation of the queue for each VO had to be done in site templates. This has been changed and sites must now use CE_QUEUES_SITE to define site-specific queues or redefine attributes of standard queues.

PBS/Torque

PBS/Torque related templates support the following variables :

  • TORQUE_SERVER_HOST: name of Torque server. Defaults to CE_HOST.
  • TORQUE_SERVER_PRIV_HOST: alternate name of Torque server on the private network if any. Defaults to CE_PRIV_HOST.
  • TORQUE_SUBMIT_FILTER : this variable allow to redefine the script used as a Torque submit filter. A default filter is provided in standard templates.
  • TORQUE_TMPDIR : normally defined to refer to the working area created by Torque for each job, on a local filesystem. Define as null if you don't want job current directory to be redefined to this directory.
  • TORQUE_SERVER_ATTRS : nlist allowing to customize all server-related Torque parameters. For the complete list of supported parameters and default values, look at common/torque2/server/config.tpl. To undefine an attribute defined by default, define it to undef.
  • WN_ATTRS : this variable is a nlist with one entry per worker node (key is the node fullname). Each value is a nlist consisting in a set of PBS/Torque attribute to set on the node. Values are any key=value supported by qmgr set server command. One useful value is state=offline to cause a specific node to drain or state=online to reenable the node (suppressing state=offline is not enough to reenable the node). One specific entry in WN_ATTRS is DEFAULT : this entry is applied to any node that doesn't have a specific entry. If you want to avoïd re-enabling a node explicitly, you can have the DEFAULT entry be defined with the state=free arguments. For instance, you could define :
    variable WN_ATTRS ?= nlist(
        "DEFAULT", nlist("state","free"),
        "mynode.mydomain.com", nlist("state","offline")
    );
    
  • WN_CPUS_DEF : default number of CPU per worker node.
  • WN_CPUS : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default.
  • WN_CPU_SLOTS : number of job slot (Torque processors) to create per physical CPU. Default is 2 to allow both a normal job slot and a standing reservation reserved for short deadling jobs.

For more details about all of these variables, their format and their default values, look at template defining default values for gLite related variables.

MAUI

MAUI is configured using the following variables :

  • MAUI_SERVER_CONFIG : nlist defining site-specific value for MAUI server base parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at common/maui/server/config.tpl for a list of supported parameters.
  • MAUI_SERVER_POLICY : nlist defining site-specific value for MAUI server scheduling policy parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at common/maui/server/config.tpl for a list of supported parameters.
  • MAUI_SERVER_RMCFG : nlist defining site-specific value for MAUI server resource manager configuration parameters. Keys are MAUI configuration parameters and value are parameter values. Defaults should be appropriate. Look at common/maui/server/config.tpl for a list of supported parameters.
  • MAUI_GROUP_PARAMS : nlist defining group-specific parameters. Valid values are anything accepted by MAUI configuration directive GROUPCFG. Key is either a group (VO) name or DEFAULT. Default entry is applied to all groups (VOs) defined but without an explicit entry.
  • MAUI_USER_PARAMS : nlist defining user-specific parameters. Valid values are anything accepted by MAUI configuration directive USERCFG. Key must be a user name.
  • MAUI_CLASS_PARAMS : nlist defining class-specific parameters. Valid values are anything accepted by MAUI configuration directive CLASSCFG. Key is either a class name or DEFAULT. Default entry is applied to all classes defined but without an explicit entry.
  • MAUI_ACCOUNT_PARAMS : nlist defining account-specific parameters. Valid values are anything accepted by MAUI configuration directive ACCOUNTCFG. Key must be a account name.
  • MAUI_NODE_PARAMS : nlist defining node-specific parameters. Keys must match worker node names or be DEFAULT. Values must be a nlist where keys are any valid keyworkds accepted by MAUI configuration directive NODECFG and values the value for the corresponding keyword.
  • MAUI_STANDING_RESERVATION_ENABLED : boolean value defining if creation of 1 standing reservation per node is enabled or not. Default : true. Note that use of this feature requires a proper setting of variable WN_CPU_SLOT (normally 2).
  • MAUI_STANDING_RESERVATION_CLASSES : nlist defining classes which may access standing reservations. Key must be either a WN name or DEFAULT. Default entry is applied to all WNs without an explicit entry. Value must be a comma-separated list of classes.
  • MAUI_WN_PART_DEF : default node partition to use with worker nodes.
  • MAUI_WN_PART : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node.
  • MAUI_GROUP_PART : nlist defining partitions whose access is allowed on a per group (VO) basis. Key is either a group (VO) name or DEFAULT. Default entry is applied to all groups (VOs) defined but without an explicit entry. If not defined, defaults to MAUI_WN_PART_DEF`.
  • MAUI_SERVER_CONFIG_SITE: string containing literal MAUI configuration that must be included into final MAUI configuration, in addition to configuration provided by other variables.
  • MAUI_MONITORING_TEMPLATE: template name to use to configure MAUI monitoring script (normally a cron job). The default value should be appropriate. To disable it set the value to null but be aware of possible effects on CE publication (if using cache mode).
  • MAUI_MONITORING_FREQUENCY : frequency of checks that MAUI daemon is running and responding. Default is 15 minutes. In case of MAUI instability, can be lowered to limit impact on CE behaviour. Format is cron frequency format.

Note: if MAUI_CONFIG variable is defined, the content of this variable must contain the full content of maui.cfg file and variables MAUI_SERVER_CONFIG, MAUI_SERVER_POLICY and MAUI_RMCFG are ignored.

In addition to the variable to configure MAUI itself, there is one variable related to resource publishing into the BDII. See specific section.

RSH and SSH Configuration

By default Quattor doesn't configure any RSH or SSH trust relationship between CE and WNs if home directories are on a shared filesystem declared in variable WN_SHARED_AREAS. Else it configures SSH with host-based authentication. By default RSH is always configured with an empty hosts.equiv file.

If this doesn't fit your needs, you can explicitly control RSH and SSH configuration with the following variables :

  • CE_USE_SSH : if undef (default), configuration is based on use of a shared filesystem for home directories. Else it explicitly set whether to configure SSH host-based authentication (true) or not (false).
  • SSH_HOSTBASED_AUTH_LOCAL : when this variable is true and CE_USE_SSH is false, configure SSH host-based authentication on each WN restricted to the current WN (ability to use SSH without entering a password only for ssh to the current WN). This is sometimes required by some specific software.
  • RSH_HOSTS_EQUIV : If true, /etc/hosts.equiv is created with an entry for the CE and each WN. If false an empty /etc/hosts.equiv is created. If undef, nothing is done. Default is undef.
  • SSH_DAEMON_SITE_CONFIG: if defined, must be a nlist containing a valid list of sshd options. if null, default configuration for sshd is not defined and the site must build the configuration with a site-specific method.

CE Publishing into BDII

When using Torque/MAUI, the default plugin provided with gLite to retrieve the number of job slots configured and the number of free slots is using Torque. This doesn't allow to reflect correctly a configuration where advanced MAUI features like standing reservations are used. An alternative plugin, based on MAUI, is available and distributed with QWG templates (even though it is totally independent). To use this MAUI-based plugin instead of the Torque-based one, define the following variable in your gLite parameters (this variable is ignored if the LRMS used in not Torque):

variable GIP_CE_USE_MAUI ?= true;

This variable is true by default in QWG templates. Set it to false if you want to use the standard plugin.

Another specific feature provided by QWG templates with respect to CE publishing into the BDII is the ability to run plugins in charge of updating CE dynamic information as a cron job on the LRMS host and to cache their outputs for later use by GIP itself. This is generally necessary in a multiple CE configuration and this is mandatory with MAUI-based plugins when using Torque/MAUI as MAUI commands can be executed only on the MAUI server. This cache mode is also lowering the polling rate on the batch system and protects again temporary failure of the LRMS to respond to the inquiry command (this is quite usual with MAUI when it is overloaded). To activiate this feature, you need to define the following variable in your gLite parameters:

variable GIP_CE_USE_CACHE ?= true;

This variable default depends on the number of CE configured. When there is only one CE, it is false for backward compatibility, else it is true. But it is recommended to set it to true inconditionally.

Note: cache mode, even though it is essentially independent of the LRMS, is currently implemented only for MAUI. Defining this variable for unsupported LRMS has no effect.

CE Status

CE related templates use variable CE_STATUS to control CE state. Supported values are :

  • Production : this is the normal state. CE receives and processes jobs.
  • Draining : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them).
  • Closed : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete.
  • Queuing : CE accepts new jobs but will not execute them.

CE_STATUS indicates the desired status of the CE. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is Production. This variable can be used in conjunction to WN_ATTRS to drain queues and/or nodes.

Restarting LRMS Client

It is possible to force a restart of LRMS (batch system) client on all WNs by defining variable LRMS_CLIENT_RESTART. This variable, if present, must be a nlist with one entry per WN to restart (key is the WN name) or 'DEFAULT' for all WNS without a specific entry. When the value is changed (or first defined), this triggers a LRMS client restart. The value itself is not relevant but it is advised to use a timestamp for better tracking of forced restart.

For example to force a restart on all WNs, you can add the following definition :

variable LRMS_CLIENT_RESTART = nlist(
  'DEFAULT', '2007-03-24:18:33',
);

A good place to define this variable is template pro_site_cluster_info in cluster site directory.

Note : this feature is currently implemented only for Torque v2 client.

Run-Time Environment

gLite 3.0 templates introduce a new way to define GlueHostApplicationSoftwareRunTimeEnvironment. Previously it was necessary to define a list of all tags in the site configuration template. As most of these tags are standard tags attached to a release of the middleware, there is now a default list of tags defined in the default configuration site template, defaults/site.tpl. To supplement this list with tags specific to the site (e.g. LCG_SC3), define a variable CE_RUNTIMEENV_SITE instead of defining CE_RUNTIMEENV :

variable CE_RUNTIMEENV_SITE = list("LCG_SC3");

Note: if CE_RUNTIMEENV is defined in the site configuration template, this value will be used and must supply all the standard tags in addition to site-specific ones.

Working Area on Torque WNs

By default, QWG templates configure Torque client on WNs to define environment variable TMPDIR and location of stdin, stdout and stderr to a directory local to the worker node (/var/spool/pbs/tmpdir) and define environment variable EDG_WL_SCRATCH to TMPDIR (except for jobs requiring several WNs, e.g. MPI). This configuration is particularly adapted to shared home directories but works well with non shared home directories too.

The main requirement is to appropriately size /var on the WNs as jobs sometimes require a large scratch area. On the other hand, /home is not required to be very large, as it should not store very large files for a long period. It is strongly recommended to use shared home directories, served through NFS or another distributed file system, as it optimizes /home usage and allows to dedicate local disk space on WNs to /var.

If your configuration cannot be set as recommended or if you current configuration has a large space in /home and a limited space in /var, you can define the following property in your WN profiles before including machine-types/wn :

variable TORQUE_TMPDIR = /home/pbs/tmpdir";

Restricting Access to CEs

It is possible to ban some users or restrict time slots when the CEs are open for grid usage using LCAS middleware component. QWG allows to easily configure them.

Home Directory Purging

A cron job is responsible for purging directories created for each job under the user home directory. By default, this job runs twice a week (on Sunday and Wednesday) and removes any file and directories older than 15 days in the home directory. This can be tuned with the following variables:

  • CE_CLEANUP_ACCOUNTS_IDLE: minimum age of a file (in days) for the file to be purged (Default: 15).
  • CE_CLEANUP_ACCOUNTS_DAYS: a comma-separated list of days in cron format (day number or first three letters of the day name) when to run the cron job (Default: Sunday, Wednesday).

WN Configuration

Base template :

  • DPM : machine-types/wn.

WN configuration is derived from CE and batch system configuration. To configure your WN for specific local requirements, use variable WN_CONFIG_SITE` which must be a template with all the specific actions required on your local nodes.

WN Profile Cloning

QWG templates support profile cloning, a feature known as dummy WN that allows to speed up dramatically compilation of WNs. This is based on the fact that all WNs generally share the same configuration information, except for the hardware description and some parameters like network configuration.... With profile cloning, instead of compiling separately all WNs (and generally rebuild all of them when one dependency was modified, only one profile, called the exact node and used as a reference profile, is really compiled. On the other WNs, even though the source profile looks the same, the compilation is not done but instead the reference profile is included in the WN profile and a very small part of the configuration is replayed to do the actual node customization.

Note: disk partitioning and file system configuration are not replayed on each node. This means that the reference profile and the other nodes configured to use cloning must have a similar disk configuration.

The main variable to enable profile cloning (currently supported only for gLite WN) is USE_DUMMY. It is false by default and must be set to true to enable profile cloning. This variable must be defined to true on all WNs, including the reference one (exact node).

To use profile cloning, in additon to enable its use, you need to define a set of variables, generally in a common profile called by all WNs. This is generally done by creating a site-specific machine type for the WN (typically in sites/xxx/machine-types/xxx/wn, be sure not to overload standard machine-types/wn) that will do all the necessary initializations and include the standard machine-types/wn.

The variables you need to define for profile cloning to work are:

# Prefix of template names for all profiles. When using the old naming convention `profile_xxx`, define to `profile_`
variable PROFILE_PREFIX ?= '';
# Name of the reference profile (string after PROFILE_PREFIX in its profile name)
variable EXACT_NODE ?= 'grid100';
# Regexp (perl compatible) matching the node name part of profile names of eligible nodes
variable NODE_REGEXP ?= 'grid.*';
# Variable pointing to some site-specific templates. Customize to match your configuration.
variable SITE_DATABASES ?= 'site/databases';
variable GLOBAL_VARIABLES ?= 'site/global_variables';
variable SITE_FUNCTION ?= 'site/functions';
variable SITE_CONFIG ?= 'site/config';
variable FILESYSTEM_CONFIG_SITE ?= "filesystem/config";
variable GLITE_SITE_PARAMS ?= "site/glite/config";

In addition to these variable definitions, another variable WN_DUMMY_DISABLED is available. This is a nlist where the key is an escaped node name and the value must be true to disable the use of profile cloning on a specific node. This allows to add USE_DUMMY variable the site-specific machine type definition for a WN, with a default value of true. And then, editing just one template (rather than editing each profile template individually), control the specific nodes where you want to disable the use of profile cloning. WN_DUMMY_DISABLED is typically defined in a site-specific template like site/wn-cloning-config, that is included in the site-specific definition of a WN.

SE Configuration

Base template :

  • DPM : machine-types/se_dpm.
  • dCache : machine-types/se_dCache.

Note : This section covers the generic SE configuration, not a specific implementation.

List of site SEs

The list of SEs available at your site must be defined in variable SE_HOSTS. This variable is a nlist with one entry for each local SE. The key is the SE host name and the value is a nlist defining SE parameters.

Supported parameters for each SE are :

  • type : define SE implementation. Must be SE_Classic, SE_dCache or SE_DPM. This parameter is required and has no default. Note that SE Classic is deprecated.
  • accessPoint : define the root path of any VO-specific area on the SE. This parameter is required with Classic SE and dCache. It is optional with DPM where it defaults to /dpm/dom.ain.name/homes.
  • arch : used to define GlueSEArchitecture for the SE. This parameter is optional and defaults to multidisk that should be appropriate for standard configurations.

For more details, look at example and comments in gLite defaults.

Note : Format of SE_HOSTS has been changed in gLite-3.0.2-11 release of QWG templates. Look at release notes to know how to migrate from previous format.

CE Close SEs

Variable CE_CLOSE_SE_LIST defines the SEs that must be registered in BDII as a close SE for the current CE. It can be either a value used for every VO or a nlist with a default value (key is DEFAULT) and one entry per VO with a different close SE (key is the VO name). Each value may be a string if there is only one close SE or a list of SEs.

CE_CLOSE_SE_LIST defaults to deprecated SE_HOST_DEFAULT if defined, else to all the SEs defined in SE_HOSTS variable.

It is valid to have no close SE defined. To remove default definition, you need to do :

variable CE_CLOSE_SE_LIST = nlist('DEFAULT', undef);

It is valid for the close SE to be outside your site but this is probably not recommended for standard configurations.

Default SE

Variable CE_DEFAULT_SE is used to define the default SE for the site. It can be either a SE name or a nlist with a default entry (key is DEFAULT) and one entry per VO with a different default SE (key is the VO name).

By default, if not explicitly defined, it defaults to the first SE in the appropriate CE_CLOSE_SE_LIST entry. The default SE can be outside your site (probably not recommended for standard configurations).

DPM Configuration

DPM-related standard templates require a site template to describe site/SE configuration for DPM. The variable DPM_CONFIG_SITE must contain the name of this template. This template defines the whole DPM configuration, including all disk servers used and is used to configure all the machines part of the DPM configuration.

On DPM head node (in the node profile), variable SEDPM_SRM_SERVER must be defined to true. This variable is false by default (DPM disk servers).

If you want to use Oracle version of DPM server define the following variable in your machine profile :

variable DPM_SERVER_MYSQL = false;

DPM site parameters

There is no default template provided for DPM configuration. To build your own template, you can look at template pro_se_dpm_config.tpl in examples provided with QWG templates.

Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You must provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error :

[pan-compile] *** wrong argument: operator + operand 1: not a property: element

If you want to use a specific VO list on your DPM server and you have several nodes in your DPM configuration (DPM head node + disk servers), you need to write a template defining VOS variable (with a non default value) and define variable NODE_VO_CONFIG to this template in the profile of DPM nodes (both head node and disk servers).

Using non-standard port numbers

It is possible to use non-standard port numbers for DPM daemons dpm, dpns and all SRM daemons. To do this, you need to define the XXX_PORT variable corresponding to the service in your gLite site parameters. Look at gLite default parameters to find the exact name of the variable.

Note: this is not recommended to change the port number used by DPM services in normal circumstances.

Using a non-standard account name for dpmmgr

If you want to use an account name different from dpmmgr to run DPM daemons, you need to define variable DPM_DAEMON_USER in your site configuration template and provide a template to create this account, based on users/dpmmgr.tpl.

LFC Configuration

Base template : machine-types/lfc.

LFC related standard templates require a site template to describe the service site configuration. The variable LFC_CONFIG_SITE must contain the name of this template.

If you want to use Oracle version of LFC server define the following variable in your machine profile :

variable LFC_SERVER_MYSQL = false;

LFC templates allow a LFC server to act as a central LFC server (registered in BDII) for some VOs and as a local LFC server for the others. This are 2 variables controlling what is registered in the BDII :

  • LFC_CENTRAL_VOS : list of VOs for which the LFC server must be registered in BDII as a central server. Default is an empty list.
  • LFC_LOCAL_VOS : list all VOs for which the server must be registered in BDII as a local server. Default to all supported VOs (VOSvariable). If a VO is in both lists, it is removed from LFC_LOCAL_VOS. If you don't want this server to be registered as a local server for any VO, even if configured on this node (present in VOS list), you must define this variable as an empty list :
    variable LFC_LOCAL_VOS = list();
    

VOs listed in both lists must be present in VOS variable. These 2 variables have no impact on GSI (security) configuration and don't control access to the server. If you want to have VOS variable (controlling access to the server) matching the list of VOs supported by the LFC server (either as central or local catalogues), you can add the following definition to your LFC server profile :

variable VOS = merge(LFC_CENTRAL_VOS, LFC_LOCAL_VOS);

LFC site parameters

Normally the only thing really required in this site-specific template is the password for LFC user (by default lfc) and the DB accounts. Look at standard LFC templates/trunk/glite-3.0.0/glite/lfc/config configuration template for the syntax.

Starting with QWG Templates release gLite-3.0.2-9, there is no default password value provided for account used by DPM daemons and for the DB accounts used to access the DPM database. You MUST provide one in your site configuration. If you forget to do it, you'll get a not very explicit panc error :

[pan-compile] *** wrong argument: operator + operand 1: not a property: element

LFC Alias

It is possible to configure a LFC server to register itself into the BDII using a DNS alias rather than the host name. To achieve this, you need to define in your site parameters a variable LFC_HOSTS (replacement for former LFC_HOST) which must be a nlist where keys are LFC server names and values are nlist accepting the following parameters :

  • alias : DNS alias to use to register this LFC server into the BDII

Using non-standard port numbers

It is possible to use non-standard port numbers for LFC daemons. To do this, you only need to define the XXX_PORT variable corresponding to the service. Look at gLite default parameters to find the exact name of the variable.

Note: this is not recommended to change the port number used by LFC services in normal circumstances.

Using a non-standard account name for lfcmgr

If you want to use an account name different from lfcmgr to run LFC daemons, you need to define variable DPM_USER in your site configuration template and provide a template to create this account, based on users/lfcmgr.tpl.

WMS and LB

Base templates :

  • machine-types/wms : a WMS only node.
  • machine-types/lb : a LB only node.
  • machine-types/wmslb : a combined WMS/LB node (not recommended).

WMS and LB are 2 inter-related services : a complete WMS is made of at least one WMS and 1 LB. For scalability reasons, it is recommended to run WMS and LB on several machines : 1 LB should scale to 1M+ jobs per day where 1 WMS scales only to 20 Kjobs per day. Several WMS can share the same LB. Don't expect a combined WMS/LB to scale to more than 10 Kjobs/day. And be aware that a WMS needs a lot of memory: 4 GB is the required minimum.

WMS and LB site-specific configuration is normally kept in one template, even if they run on several machines, to maintain consistency. Variable WMS_CONFIG_SITE must be defined to the name of this template, even for a LB. If you want to use a separate template to configure LB (not recommended), you can also use LB-specific variable, LB_CONFIG_SITE.

List of VOs supported by WMS, if not your default list as defined in your site-specific parameters, must be defined in another template that will be included very early in the configuration. Variable NODE_VO_CONFIG must be defined to the name of this template. This template generally contains only variable VOS definition.

Main variables that need to be customized according to your WMS and LB configuration are :

  • LB_MYSQL_ADMINPWD : password of MySQL administrator account. There is no default, be sure to define to a non empty string.
  • LB_TRUSTED_WMS : a list of DN matching host DN of all WMS allowed to use this LB. May remain empty (default) on a combined WMS/LB.
  • WMS_LB_SERVER_HOST : define LB used by this WMS. Keep default value on a combined WMS/LB.

In addition to these variables, there are several variables to tune performances of WMS, in particular its load monitor subsystem. Look at glite/wms/config.tpl and templates provided with ncm-wmslb component for a list of all available variables. The defaults should be appropriate; avoid modifying these variables without a clear reason to do so. In particular avoid setting too high thresholds as it may lead to WMS machine to be very much overloaded and service response time to be very bad. Most of the variables are related to the WM component of WMS. The main ones are:

  • WMS_WM_EXPIRY_PERIOD : maximum time in seconds to retry match making in case of failure to find a resource compatible with requirements. Default: 2 hours.
  • WMS_WM_MATCH_RETRY_PERIOD : Interval in seconds between 2 match making attempts. Must be less than WMS_WM_EXPIRY_PERIOD. Default : 30 mn.
  • WMS_WM_BDII_FILTER_MAX_VOS : maximum number of VOs configured on the WMS to define a LDAP filter when querying the BDII. Default: 10.
  • WMS_WMPROXY_SDJ_REQUIREMENT : match making requirement to add when ShortDeadlineJob=true in JDL. The same requirement is added negated for non SDJ jobs. Default should be appropriate (every queue whose name ends with sdj).

Load Monitor

WMS has an integrated feature to monitor load on the machine it runs on and refuse to accept new jobs if the load is higher than defined thresholds. Available variables to define threshold are :

  • WMS_LOAD_MONITOR_CPU_LOAD1 : maximum CPU load averaged on 1 minute (as defined by top or xload). Default : 10.
  • WMS_LOAD_MONITOR_CPU_LOAD5 : maximum CPU load averaged on 1 minute (as defined by top or xload). Default : 10.
  • WMS_LOAD_MONITOR_CPU_LOAD15 : maximum CPU load averaged on 1 minute (as defined by top or xload). Default : 10.
  • WMS_LOAD_MONITOR_DISK_USAGE : maximum usage (in percent) of any file system present on the machine. Default : 95 (%).
  • WMS_LOAD_MONITOR_FD_MIN : minimum number of free file descriptors. Default : 500.
  • WMS_LOAD_MONITOR_MEMORY_USAGE : maximum usage (in percent) of virtual memory. Default : 95 (%).

Draining a WMS

It is sometimes desirable to drain a WMS. When draining a WMS doesn't accept any request to submit new jobs but continues to process already submitted jobs and accepts requests about job status or to cancel a job.

With QWG, a WMS can be drain by defining in its profile the variable WMS_DRAINED to true. Undefining the variable reenable the WMS. Note that if you drain it manually and reconfigure the WMS with Quattor, it is re-enabled.

WMS Client Configuration

A few variables allow to configure default settings of WMS clients:

  • WMS_OUTPUT_STORAGE_DEFAULT: default directory where to put job outputs (one directory per job will be created in this directory). Default: ${HOME}/JobOutput.

BDII

Base template : machine-types/bdii.

QWG Templates support configuration of all types of BDII :

  • Top-level BDII (default type) : use a central location to get their data (all BDIIs use the same source). This central location contains information about all sites registered in the GOC DB. Use of FCR (Freedom of Choice) enabled by default.
  • Site BDII : BDII in charge of collecting information about site resources. Support the concept of sub-site BDII (hierarchy of BDII to collect site information).
  • Resource BDII : used in replacement of Globus MDS to publish resource information into BDII.

When configuring BDII on a machine, the following variables can be used (in the machine profile or in a site-specific template) to tune the configuration :

  • BDII_TYPE : can be resource, site, top. top is the default, except if deprecated variable SITE_BDII is true.
  • BDII_SUBSITE : name of the BDII sub-site. Ignored on any BDII type except site. Must be empty for the main site BDII (default) or defined to the sub-site name if this is a subsite BDII.
  • BDII_SUBSITE_ONLY (gLite 3.1 only) : if false, allow to run both subsite and site BDII on the same machine. Default : true.
  • BDII_USE_FCR : set to false to disable use of FCR (Freedom of Choice) on top-level BDII or to true to force its use on other BDII types. This value is ignored if BDII type is not top. Default is true.
  • BDII_FCR_URL : use a non-standard source for FCR.

Starting with QWG templates gLite-3.0.2-13, all machine types publishing information into BDII (almost all except WN, UI and disk servers) are using a BDII configured as a resource BDII for this purpose. In addition all these machine types can also be configured as a site/subsite BDII by defining appropriate variable into node profile (BDII_TYPE='site' and if applicable BDII_SUBSITE).

Note : combined BDII used to be the default on LCG CE for backward compatibility but this is no longer the case. It is advised to run site BDII preferably on a dedicated machine. If this is not possible, choose any machine type but the CE as this machine can be very loaded and site BDII may become unresponsive with a lot of side effects.

Configuring BDII URLs on a site BDII

A site BDII aggregates information published by several other BDIIs, typically resource BDIIs or subsite BDIIs. List of resources to aggregate are specified by the variable BDII_URLS. This variable is typically defined in site parameters, site/glite/config.tpl, and is ignored on all nodes except a site (or combined) BDII.

Variable BDII_URLS is a nlist of URLs corresponding to the resource BDII endpoints (urls) aggregated on the site BDII. The key is an arbitrary name (like CE, DPM1...) but must be unique and the value is the endpoint. See site configuration example.

Important: site, subsite and top-level BDIIs run a resource BDII that publishes information about themselves. They must be added to the BDII_URLS variable.

Restriction : each BDII in BDII hierarchy must use a different mds-vo-name. Thus it is not possible to use the mds-vo-name of a site BDII in BDII_URLS or this will be considered as a loop and the entry will be ignored.

Configuring a subsite BDII

It is possible to run a hierarchy of site BDII. This is particularly useful for a site made of several autonomous entities as it allows each subsite to export a unique access point to published subsite resources. Each subsite manages the actual configuration of its subsite BDII and all the subsites are then aggregated by the site BDII. GRIF site is an example of such a configuration.

A subsite BDII is a site BDII where variable BDII_SUBSITE has been defined to a non empty value. This value is appended to site name to form the mds-vo-name for the subsite.

When using an internal hierarchy of site and subsite BDIIs, BDII_URLS must be used for subsite BDIIs. To define the BDII endpoints that must be collected by the site BDII, you must use BDII_URLS_SITE. This allow both to coexist in the same site parameter template (typically site/glite/config.tpl) and both have the same syntax. BDII_URLS_SITE contains typically the endpoint of each resource BDII inside the site.

When co-locating on the same machine a subsite BDII and a site BDII, this may lead to a problem with the GlueSite object: several objects could be published with a different DN, depending on the subsite BDII actually publishing it. This is particularly a problem if you run several subsite BDIIs also acting as a site BDII in different subsite as you will publish to the top BDII several different GlueSite object for your site. To solve this, it is possible to publish the GlueSite object in non-standard branch of the information tree, using the variable SITE_GLUE_OBJECT_MDS_VO_NAME. The value of this variable will be used instead of resourceand thus the GlueSite object will be invisible on the resource BDII of the site BDII. To get the GlueSite object published by site BDII, it is necessary to add an entry in BDII_URLS_SITE for the active site BDII (using the DNS alias generally associated with the service) using the same mds-vo-name as specified in variable SITE_GLUE_OBJECT_MDS_VO_NAME.

Defining Top-level BDII

It is necessary to define the top-level BDII used by the site. This is done by variable TOP_BDII_HOST. This variable replaces deprecated BDII_HOST. It has no default.

Note : this is a good practice to use a DNS alias as the top-level BDII name. This allows to change the actual top-level BDII without editing configuration. This has the advantage that the change is taken into account by running jobs (if there is no DNS caching on the WNs).

Configuring BDII alias

When several BDIIs are used to provide the same BDII service (either top or site) in order provide service load balancing and/or failover, they are generally all associated with a DNS alias (CNAME). In this configuration, the endpoint published for the BDII should be the alias instead of the BDII host name (default). This is done by one of the following variables, depending on whether the BDII is a site BDII or a top BDII:

  • BDII_ALIAS_TOP: DNS name to use in the top BDII endpoint.
  • BDII_ALIAS_SITE: DNS name to use in the site BDII endpoint.

MPI Support

To activate MPI support on the CE and WNs, you need to define variable ENABLE_MPI to true in your site parameters (normally site/glite/config.tpl). It is disabled by default.

A default set of RPMs for various flavours of MPI (MPICH, MPICH2, OPENMPI, LAM) will be installed. If you would like to install a custom version of a particular MPI implementation, you can do so by defining the following variables:

  • MPI_<flavour>_VERSION : Version of the package (e.g. MPI_MPICH_VERSION = "1.0.4")
  • MPI_<flavour>_RELEASE : Release number of the package (e.g. MPI_MPICH_RELEASE = "1.sl3.cl.1")
  • MPI_<flavour>_EXTRAVERSION : Patch number of the package (if needed e.g. MPI_MPICH_EXTRAVERSION="p1")

These variables ensure that the version published is consistent with the installed RPMs.

FTS Client

On machine types supporting it (e.g. UI, VOBOX, WN), you can configure a FTS client. Normally, to configure FTS client you only need to define variable FTS_SERVER_HOST to the name of your preferred FTS server (normally your closest T1).

To accommodate specific needs, there are 2 other variables whose default value should be appropriate :

  • FTS_SERVER_PORT : port number used by FTS server. Default : 8443.
  • FTS_SERVER_TRANSFER_SERVICE_PATH : root path of transfer service on FTS server. This is used to build leftmost part of URLs related to FTS services. Default : /glite-data-transfer-fts`.

Note : for backward compatibility, this is still possible to directly define variable FTS_SERVER_URL, even though it is recommended to change your site parameters and use the new variables instead.

MonBox and APEL

Base template : machine-types/mon.

MonBox is the service in charge of storing local accounting. It is used in conjunction with APEL, the framework in charge of collecting accounting data on the CEs and publishing them into the central accounting. APEL is made of 2 parts:

  • the parser: in charge of parsing batch system and globus accounting/log files, producing the normalized grid accounting data and storing them into the MonBox database. Normally running on the CE, there is one parser by type of batch system.
  • the publisher: in charge of publishing the local accounting data stored on the MonBox into the grid central accounting. Normally runs on the MonBox.

MonBox requires the following configuration variables:

  • MON_MYSQL_PASSWORD: password MySQL administrator (root) on MonBox.
  • MON_HOST: host name of MonBox.

APEL configuration requires the following variables:

  • APEL_ENABLED: wheter to enable APEL. Default: true.
  • APEL_DB_NAME: APEL database name on MonBox. Default: accounting.
  • APEL_DB_USER: MySQL user to access APEL database on MonBox. Default: accounting.
  • APEL_DB_PWD: MySQL password to access APEL database on MonBox.

By default, APEL publisher is run on MonBox. If you'd like to run it on another machine, add the following line in the machine profile:

include { 'common/accounting/apel/publisher' };

Note: even though APEL publisher is not run on a MonBox, it does require access to a MonBox.

After the initial installation of the machine, you need to install a certificate on the machine as the usual location (/etc/grid-security), except if you use an installation (AII) hook to do it during the installation. After doing it you need to run again manually the Quattor configuration module ncm-rgmaserver or to reboot the machine. To run the configuration module, use the following command:

ncm-ncd --configure rgmaserver

MyProxy Server

Base template : machine-types/px.

MyProxy server configuration consists of defining policies for access to proxies stored on the server and their renewal. There are 2 sets of policiies : explicitly authorized policies and default policies. For each set a separate policy can be defined for:

  • renewers : list of clients able to renew a proxy. The variables to use are MYPROXY_AUTHORIZED_RENEWERS and MYPROXY_DEFAULT_RENEWERS.
  • retrievers : list of clients able to retrieve a proxy it they have valid credentials and provide the same username/password as the one used at proxy creation. The variables to use are MYPROXY_AUTHORIZED_RETRIEVERS and MYPROXY_DEFAULT_RETRIEVERS.
  • key retrievers : list of clients able to retrieve a proxy, including the private key, it they have valid credentials and provide the same username/password as the one used at proxy creation. The variables to use are MYPROXY_AUTHORIZED_KEY_RETRIEVERS and MYPROXY_DEFAULT_KEY_RETRIEVERS.
  • trusted retrievers : list of clients able to retrieve a proxy without providing valid credentials (but providing the same username/password as the one used at proxy creation if one was used). The variables to use are MYPROXY_AUTHORIZED_TRUSTED_RETRIEVERS and MYPROXY_DEFAULT_TRUSTED_RETRIEVERS. Clients listed in these variables are automatically added to the corresponding retrievers list (MYPROXY_AUTHORIZED_RETRIEVERS or MYPROXY_DEFAULT_RETRIEVERS).

The list values must be client DNs or regexp matching a client DN. Regexp must be used with caution as they may result in giving a broader access than wanted. For more information about the different policies and the regexp syntax, see the manpage for MyProxy server configuration:

man myproxy-server.config

In addition to the previous variable, it is possible to use variable GRID_TRUSTED_BROKERS to define the WMS which are allowed to use the MyProxy server. The list provided with this variable is merged with MYPROXY_AUTHORIZED_RENEWERS.

VOMS Server

Base template : machine-types/voms.

VOMS server default configuration can be customized with the following variables:

  • VOMS_VOS: this variable describe each VO managed by the VOMS server. This is a nlist where the key is the VO name and the value a nlist specifiying the VO parameters. A typical entry is:
      'vo.lal.in2p3.fr',  nlist('port', '20000',
                                'host', 'grid12.lal.in2p3.fr',
                                'dbName', 'voms_lal',
                                'dbUser', 'root',
                                'dbPassword', 'clrtxtpwd',
                                'adminEmail', 'vomsadmins@example.com',
                                'adminCert', '/etc/grid-security/vomsadmin.pem',
                               ),
    
  • VOMS_DB_TYPE: can be mysql or oracle.
  • VOMS_MYSQL_ADMINPWD: password of the MySQL administrator account (MySQL account). Required if DB type is mysql (no default).
  • VOMS_MYSQL_ADMINUSER: username of the MySQL administrator account (MySQL account). Ignored if DB type is not mysql. Default: root.
  • VOMS_ADMIN_SMTP_HOST: SMTP host used by VOMS admin when sending emails. Default: localhost.
  • VOMS_CRON_EMAIL: user to notify in case of problems during cron jobs. Default: root@localhost.

In addition to configuring the previous variable, it is generally necessary to install the certificate of the initial administrator of the VO. This certificate is passed in parameter adminCert in VO parameters (VOMS_VOS). This is typically done with Quattor configuration module filecopy in the site-specific configuration of the VOMS server. A typical sequence to do this is:

include { 'components/filecopy/config' };
variable CONTENTS = <<EOF;
-----BEGIN CERTIFICATE-----
... Copy certificate from the PEM file ...
-----END CERTIFICATE-----
EOF

# Now actually add the file to the configuration.
'/software/components/filecopy/services' = 
  npush(escape('/etc/grid-security/vomsadmin.pem'), 
        nlist('config',CONTENTS,
              'perms','0755'));

For more information on VOMS server configuration parameters, you may want to look at the VOMS server administration guide.

VOBOX

Base template : machine-types/vobox.

The VOBOX is a machine dedicated to one VO running VO-specific services. In addition to the VO-specific services, this machine runs a service called proxy renewal in charge of renewing the grid proxy used by VO-specific services.

This is critical for the security to restrict the number of people allowed access to the VOBOX. By default, only people with the VO SW manager role can log into the VO box. To change this configuration, refer to section on VOMS groups/roles mapping, but be sure you really need to allow other roles as it can give unwanted users access to privilege services.

The configuration templates for the VOBOX enforce there is only one VO configured for acess to VOBOX-specific services. This VO must be declared using the VOS variable, as for other machine types. If you want to give other VOs access to the VOBOX for the management and operation of the VOBOX, you need to explicitly allow them using the variable VOBOX_OPERATION_VOS. This variable is a list of VOs considered as operation VOs. By default, this list is only VO ops. If the VOs listed in this variable are not listed in VOS, they are automatically added.

Only the enabled VO has a gsissh access to the VOBOX by default. If you want the operation VOs to also be enabled for gsissh access to the VOBOX, you need to define variable VOBOX_OPERATION_VOS_GSISSH to true in the VOBOX profile. Only the FQAN enabled by VO_VOMS_FQAN_FILTER will be enabled for each VO (default: SW manager).

Note: if you add dteam VO to operation VOs and enable gsissh access for operation VOs, be sure to restrict the people who will be allowed interactive access to the VOBOX, as dteam is a very large VO with people from every grid site.

There are some other variables available to tune the VOBOX configuration but the default should generally be appropriate. The main ones are:

  • VOBOX_TCP_MAX_BUFFER_SIZE: the maximum TCP buffer size to use. This is critical to reach good performances on high speed network. Default: 8388608.
  • VOBOX_TCP_MAX_BACKLOG: another critical TCP congestion control parameter to reach high throughput and good performances. Default: 250000.

In addition, it is generally necessary to define the default MyProxy server (MYPROXY_DEFAULT_SERVER).

Note: it is recommended not to define gsissh-related variables, as documented in the UI section, as this may interfere with the standard VOBOX configuration. The only exception is GSISSH_PORT.

UI

Base template : machine-types/ui.

UI may be run on a non-grid machine where the proposed base template is not suitable. In this case, if the machine is managed by Quattor, it is possible to reuse part of the base template on the target machine : mainly VO configuration, glite/service/ui and gLite updates.

On a standard UI, user accounts must be created using a method appropriate to the local site. It can be NIS, LDAP or using the template provided with QWG to manage user creation.

It is also possible to configure a UI to be accessed through gsissh. In this configuration, users use their grid certificate to authenticate on the UI and are mapped to a pool account of the VO. To configure a UI with gsissh, it is only necessary to define variable GSISSH_SERVER_ENABLED to true in the machine profile.

When configuring a gsissh-enabled UI, there are a few specific variables available to customize gsissh server:

  • UI_GSISSH_CONFIG_SITE: name of a template to execute before configuring gsissh server. For everything related to VO configuration, be sure to use VO configuration variables as this is done before executing this template.
  • GSISSH_SERVER_VOS: subset of configured VOs on the node that must be enabled for gsissh access. Default: all configured VOs (VOS).
  • GSISSH_PORT: port used by gsissh server. Default: 1975.

Note: be aware that gsisshd is a an authenticated grid service and thus require the UI to have a server certificate, as any other grid service machine.

Customizing Default Environment

Main variables to customize environment seen by users on a UI are:

  • MYPROXY_DEFAULT_SERVER: name of default MyProxy server to use with myproxy-xxx commands.
  • Variables related to FTS client.