wiki:Doc/gLite/TemplateCustomization

Version 20 (modified by /C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…, 19 years ago) ( diff )

--

gLite Template Customization

TOC(inline)

Site customization to QWGtemplates is done through a small set of templates used to define variables used as input by QWG templates. This doesn't cover OS basic configurationt that is decribed in the page about template framework.

All site parameters related to QWG middleware are supposed to be declared in template pro_lcg2_config_site.tpl. A sensible default value is provided for all required variables in template source:template/trunk/grid/glite-3.0.0/defaults/site.tpl provided as part of QWG templates. This template must be included as part of the site pro_lcg2_config_site.tpl that must provide an explicit value for at least all the variable undef in template source:template/trunk/grid/glite-3.0.0/defaults/site.tpl.

To ease transition from LCG2 to gLite3, the template defining default parameters can still be accessed as source:template/trunk/grid/lcg-2.7.0/sources/pro_lcg2_config_system_defaults.tpl.

Machine types

QWG templates provide a template per machine type (CE, SE, RB, ...). They are located in machine-types directory and are intended to be generic templates. No modification should be needed.

To configure a specific machine with gLite middleware, you just need to include the appropriate machine type template into the machine profile, after specifying a template containing the specific configuration for this particular machine with the variable xxx_CONFIG_SITE (look in the template for the exact name of the variable).

Here an example for configuring a Torque based CE :

object template profile_grid10;

# Define specific configuration for a GRIF CE to be added to
# standard configuration
variable CE_TORQUE_CONFIG_SITE = "pro_ce_torque_grif";

# Configure as a CE (Torque) + Site's BDII
include pro_ce_torque;

#
# software repositories (should be last)
#
include repository_common;

In this example, CE_TORQUE_CONFIG_SITE specify the name of a template defining the Torque configuration.

All the machine types share a common basic configuration, described in template machine-types/base.tpl. This template allows to add site specific configuration to this common basic configuration (e.g. configuration of a monitoring agent). This is done by defining variable GLITE_BASE_CONFIG_SITE to a template containing the site specific configuration to be added to the common configuration (at the end of the common configuration). This variable can be defined, for example, in template pro_site_cluster_info.tpl.

VO Configuration

List of VOs to configure on a specific node is defined in variable VOS. Generally a site-wide default value is defined in pro_lcg2_config_site.tpl (defined with operator ?=). This value can be overidden on a specific machine by defining VOS variable in the machine profile, before including the machine type profile.

An example of VOS definition is :

variable VOS ?= list('alice',
                     'atlas',
                     'biomed',
                     'calice',
                     'cms',
                     'cppm',
                     'dteam',
                     'dzero',
                     'egeode',
                     'lhcb',
                     'ops',
                     'planck',
                     );

Note : dteam and ops are mandatory VOs.

For each VO listed in VOS, there must be a template defining the VO parameters in vo/params. The template name must be the same as the VO name used in VOS. If the VO to be added has no template to define its parameters, refer to next section about adding a new VO.

Defining Site Specific Defaults for VOs

It is possible to define site specific defaults for VOs that override standard default. This must be done by defining variable VOS_SITE_PARAMS as a nlist with an entry DEFAULT. The value must be the name of a structure template defining any of these properties :

  • create_home : Create home directories for VO accounts. Default defined by variable CREATE_HOME variable.
  • create_keys : Create SSH keys for VO accounts. Default defined by variable CREATE_KEYS variable.
  • pool_digits : define default number of digits to use when creating pool accounts
  • pool_offset : define offset from VO base uid for the first pool account
  • pool_size : number of pool accounts to create by default for a VO
  • sw_mgr_role : description of VO software manager role. Avoid to change default.

Defining Site Specific Parameters for a VO

In addition to defining site specific defaults for VO parameters, it is possible to define site specific parameters for a specific VO. These parameters override standard parameters for the VO.

This is done in much the same way as defining site specific defaults. Variable VOS_SITE_PARAMS must be defined as a nlist with an entry corresponding the VO name (as used in VOS variable). The value must be the name of a structure template. This template can define or redefine any properties supported to define value parameters.

For example, to define a site specific RB for VO Alice, create a template vo/site/alice.tpl in your site directory like :

structure template vo/site/alice;

'lbhost' = 'myrb.example.org:9000';
'nshost' = 'myrb.example.org:7772';

and add the following entry in VOS_SITE_PARAMS in your pro_lcg2_config_site.tpl :

variable VOS_SITE_PARAMS = nlist ('alice', 'vo/site/alice',

);

Adding a New VO

Adding a new VO involved the creation of a template defining VO parameters. This template name must be the name you use to refer to the VO in rest of the configuration but is not required to be the real VO name (can be an alias used in the configuration). This template must be located in directory vo/params, in one of your cluster or site specific hierarchy of templates or in gLite templates.

Note : if you create a template for a new VO, be sure to commit it to the QWG repository if you have write access toit, or to send it to QWG developpers. There is normally no reason for a VO definition not to be generally available.

To create a template to describe a new VO, the easiest is to copy the template for an already configured VO. The main variables supported in this template are :

  • name : VO official name. No default.
  • account_prefix : prefix to use when creating accounts for the VO. Generally the 3 first letters of the VO name. No default.
  • voms_servers : a nlist describing VOMS server used by the VO, if any. If the VO has several (redundant) VOMS servers, this property can be a list of nlist. For each VOMS server, supported properties are :
    • name : name of the VOMS server. This is a name used internally by template. By default, template defining VOMS server certificate has the same name. No default.
    • host : VOMS server host name. No default.
    • port : VOMS server port associated with this VO. No default.
    • cert : template name, in vo/certs , defining VOMS server certificate. If not specified, defaults to the VOMS server name.
  • voms_roles : list of VOMS roles supported by the VO. This property is optional. For each role, the entry is a nlist with the following possible properties :
    • description : description of the VO role. This property is informational, except for VO software manager where it must be "SW manager"
    • name : VO role name, as defined on the VOMS server
    • suffix : suffix to append to account_prefix to build account name associated with this role.
  • proxy : name of the proxy server used by the VO. No default, optional.
  • nshosts : name:port of the RB used by the VO (Network Server). No default.
  • lbhosts : name:port of the RB used by the VO (Logging and Bookeeping). No default.
  • catalog : define catalog type used by the VO. Optional. Must be defined only for VO still using RLS (value must be rls or RLS).
  • base_uid : first uid to use for the VO.
  • create_home : Create home directories for VO accounts. Default defined by variable CREATE_HOME variable.
  • create_keys : Create SSH keys for VO accounts. Default defined by variable CREATE_KEYS variable.
  • gid : GID associated with VO accounts. Default : first pool account UID.
  • pool_size : number of pool accounts to create for the VO. Defaults : 200.
  • pool_digits : number of digits to use for pool accounts. Must be large enough to handle pool_size. Default is 3.

In addition to this template, you need to have another template defining the public key of the VOMS server used by the VO. This template has the name of the VOMS server by default. It can be explicitly defined with certproperty of a VOMS server entry. If the new VO is using an already used VOMS server, there is no need to add the certificate.

VO Specific Areas

There are a couple of variables available to customize VO specific areas (software area, VO accounts home directories...) :

  • VO_SW_AREAS : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory to use for the VO software area. Be sure to list this directory or its parent in WN_NFS_AREA if you want to use a shared filesystem for this area (this is highly recommended).
  • VO_HOMES : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory prefix to use when creating home directories for accounts. A suffix will be added to this name corresponding to the VO role suffix for role accounts or the the account number for pool accounts. By default, VO accounts are created in /home.
  • VO_SWMGR_HOMES : a nlist with one entry per VO (key is the VO name as used in VOS variable). The value is a directory to use as the home directory for the VO software manager. If there is not entry for a VO, VO_HOMES is used. Main purpose of this variable is to define home directory for the software manager as the VO software area. This can be achieved easily by assigning VO_SW_AREAS to this variable.
  • CREATE_HOME : this variable controls creation of VO accounts home directories. It accepts 3 values : true, false and undef. undef is a conditional true : home directories is not created if it resides on a NFS shared file system (it is listed in WN_NFS_AREAS) and the NFS server is not the current machine.

Tuning VO configuration on a specific node

Each machine type templates define VO configuration (pool accounts, gridmap file/dir...) appropriate to the machine type. If you want to change this configuration, on a specific node, you can use the following variables :

  • NODE_VO_POOLACCOUNTS (boolean) : pool account must be created for each VO initialized. Default : true.
  • NODE_VO_GRIDMAPDIR_CONFIG (boolean) : gridmapdir entries must be initialized for pool accounts. Default : NODE_VO_POOLACCOUNTS variable.
  • NODE_VO_WLCONFIG (boolean) : initialize workload management environment for each VO. Normally enabled only on resource brokers. Default : false.
  • NODE_VO_CREATEHOME (boolean) : create home directories for pool accounts. Default : true.

In addition you can execute actions specific to the local machine by defining the following variable (mainly used to define a VO list specific to a node by assigning a non default value to VOS variable) :

  • NODE_VO_CONFIG (string) : site specific template that must be included before actually doing VO intialization. Allow for specific VO modification to default VO configuration. Default : none.

Note : before modifying default VO configuration for a specific machine, be sure what you want to do is valid. Misconfiguring VO can have dramatic effects on service availability.

Allocation of Service Accounts

Some services allow to define a specific account to be used to run the service. In this case, there is one template for each of these accounts in common/users. The name of the template generally matches the user account created or, when the template is empty, the name of the service.

A site can redefine account names or characteristics (uid, home directory...). To do this, you should not edit directly the standard templates as the changes will be lost in the next version of the template (or you will have to redo them by hand). You should create a users directory somewhere in your site or cluster hierarchy (e.g. under the site directory, not directly at the same level else it will not work without adjusting cluster.build.properties) and put your customized version of the template here.

Note : don't change the name of the template, even if you change the name of the account used (else you'll need to modify standard templates needing this user).

Accepted CAs

There is one template defining all the accepted CAs. We generally produced a new one each time there is a new release of the list of CAs officially accepted by EGEE. If you need to adjust it, create a site or cluster specific copy of common/security/cas.tpl in a directory common/security.

If you need to update this template, refer to the standard procedure to do it.

Shared File Systems

It is recommended to use a shared file system mounted (at least) on CE and WNs for VO software areas. It is also sometimes convenient to use a shared file system for VO pool accounts (this is more or less a requirement to run MPI jobs). Currently, QWG templates support the use of NFS shared file systems. Configuration is done by the following variables :

  • WN_NFS_AREAS : a nlist with one entry per file system that must be NFS mounted on worker nodes (key is the escaped file system mount point). Value for each entry is the name of the NFS server and optionaly the path on the NFS server if different from the path on the worker node.
  • NFS_AUTOFS : when true, use autofs to mount NFS file systems on NFS clients. This is the recommended setting, as this is the only one to avoid complex inter-dependency in startup order. But for backward compatibility, default value is false.

File systems listed in this variable are mounted on CE and WNs. The NFS server for the file systems can be a CE, a SE or any other machines that includes machine-types/nfs.

LCG CE Configuration

QWG templates handle configuration of the LCG CE and the selected batch system (LRMS). To select the LRMS you want to use, you have to define variable CE_BATCH_NAME. There is no default. If you want to use Torque/MAUI, recommended version is torque2.

The value of CE_BATCH_NAME must match a directory in common directory of gLite3 templates.

Note : as of gLite 3.0.2, LRMS supported are Torque v1 (torque1) and Torque v2 (torque2), with MAUI scheduler.

Previous versions of QWG templates used to require definition of CE_BATCH_SYS. This is deprecated : this variable is now computed from CE_BATCH_NAME.

PBS/Torque

PBS/Torque related templates support the following variables :

  • CE_QUEUES : a nlist with one entry per queue (key is the queue name). For each queue, the value itself is a nlist. One mandatory key is attr and defines the queue parameters (qmgr set queue options). Another optional key is vos and is used to explicitly define the VOs which have access to the queue (by default, only the VO with the same name as the queue has access). Look at pro_lcg2_config_site.tpl example for an example on how to define one queue for each supported VO.
  • WN_ATTRS : this variable is a nlist with one entry per worker node (key is the escaped node fullname). Each value is a set of PBS/Torque attribute to set on the node. Value value are any key=value supported by qmgr set server command. One useful value is status=offline to cause a specific node to drain or status=online to reenable the node. Just suppressing status=offline is not enough to reenable the node. One specific entry in WN_ATTRS is DEFAULT : this entry is applied to any node that doesn't have a specific entry.
  • WN_CPUS_DEF : default number of CPU per worker node.
  • WN_CPUS : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default.

MAUI

MAUI related templates support the following variables :

  • MAUI_CFG : the content of this variable must contain the full content of maui.cfg file. Look at pro_lcg2_config_site_maui.tpl example on how to define this variable from other configuration elements.
  • MAUI_WN_PART_DEF : default node partition to use with worker nodes
  • MAUI_WN_PART : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node.

CE Status

CE related templates use variable CE_STATUS to control CE state. Supported values are :

  • Production : this is the normal state. CE receives and processes jobs.
  • Draining : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them).
  • Closed : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete.
  • Queuing : CE accepts new jobs but will not execute them.

CE_STATUS indicates the desired status of the CE. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is Production. This variable can be used in conjunction to WN_ATTRS to drain queues and/or nodes.

Run-Time Environment

gLite 3.0 templates introduce a new way to define GlueHostApplicationSoftwareRunTimeEnvironment. Previously it was necessary to define a list of all tags in the site configuration template. As most of these tags are standard tags attached to a release of the middleware, there is now a default list of tags defined in the default configuration site template, defaults/site.tpl. To supplement this list with tags specific to the site (e.g. LCG_SC3), define a variable CE_RUNTIMEENV_SITE instead of defining CE_RUNTIMEENV :

variable CE_RUNTIMEENV_SITE = list("LCG_SC3");

This change is backward compatible : if CE_RUNTIMEENV is defined in the site configuration template, this value will be used.

DPM Configuration

DPM related standard templates require a site template to describe the service site configuration. The variable DPM_CONFIG_SITE must contain the name of this template. This template defines the whole DPM configuration, including all disk servers used and is used to configure all the machines part of the DPM configuration.

There is no default template provided for DPM configuration. To build your own template, you can look at template pro_se_dpm_config.tpl in examples provided with QWG templates. If you want to use a specific VO list on your DPM server and you have several nodes in your DPM configuration (DPM head node + disk servers), you need to write a template defining VOS variable (with a non default value) and define variable NODE_VO_CONFIG to this template.

On DPM head node, variable SEDPM_SRM_SERVER must be defined to true. This variable is false by default (DPM disk servers).

If you want to use Oracle version of DPM server define the following variable in your machine profile :

variable DPM_SERVER_MYSQL = false;

As of DPM 1.5.10, the script used to publish dynamic information for DPM into BDII (space used/free per VO) has not been updated to interact properly with VOMS mapping. As a result, all VO specific pools are not counted into values published. QWG templates provide a fixed version of the script that can be installed by adding the following line to DPM head node profile :

include glite/se_dpm/server/info_dynamic_voms;

To work properly this script requires /opt/lcg/etc/DPMCONFIG (or whatever file you defined for DPNS database connexion information) to be readable by world. This can be achieved by adding the following line to your DPM configuration in your site specific template :

"/software/components/dpmlfc/options/dpm/db/configmode" = "644";

LFC Configuration

LFC related standard templates require a site template to describe the service site configuration. The variable LFC_CONFIG_SITE must contain the name of this template.

Normally the only thing really required in this site specific template is the password for LFC user (by default lfc) and the MySQL administrator (by default root). There a no default value provided for these password. Look at standard LFC templates/trunk/glite-3.0.0/glite/lfc/config configuration template for the syntax.

If you want to use Oracle version of LFC server define the following variable in your machine profile :

variable LFC_SERVER_MYSQL = false;

LFC templates allow a LFC server to act as a central LFC server (registered in BDII) for somes VOS and as a local LFC server for the others. This are 2 variables controlling what is registered in the BDII :

  • LFC_CENTRAL_VOS : list of VOs for which the LFC server must be registered in BDII as a central server. Default is an empty list.
  • LFC_LOCAL_VOS : list all VOs for which the server must be registered in BDII as a local server. Default to all supported VOs (VOSvariable). If a VO is in both lists, it is removed from LFC_LOCAL_VOS. If you don't want this server to be registered as a local server for any VO, even if configured on this node (present in VOS list), you must define this variable as an empty list :
    variable LFC_LOCAL_VOS = list();
    

VOs listed in both lists must be present in VOS variable. These 2 variables have no impact on GSI (security) configuration and don't control access to the server.

Note: See TracWiki for help on using the wiki.