wiki:Doc/LCG2/TemplateLayout

Layout and Customization of LCG2 Templates

This page contains a description of the layout of templates for LCG2 provided by QWG and how the site customization should be integrated. See page on template framework for more details on template framework structure and site customization.

Note : QWG templates require pan compiler version 5 or later.

LCG2 Template Layout

QWG templates related to LCG2 middleware are organized in several directories for easier navigation into the template hierarchy. In the future these directories should be mapped to namespaces. Currently they are all equivalent from the pan perspective : a given template can be placed into any of the directory but should not exist in more than one directory as there is no guarantee about the search order.

Directory used by QWG LCG2 templates are :

  • machine-types : templates defining the whole configuration of a given LCG2 machine type (e.g. WN, CE, SE, BDII...). There is one template per machine type. They all rely on template pro_lcg2_machine_config_base to define basic configuration of any kind of LCG2 machine.
  • rpmlist : templates defining RPMs that must be loaded for a given LCG2 service. These RPMs are generated from LCG2 middleware description and should not be edited manually (manual edits will be lost next time templates are generated).
  • repository : there is normally one template listing all the RPM repositories associated with the current version of the middleware. Each repository is defined in a template that is site specific.
  • sources : templates that define a LCG2 service configuration. These templates are maintained manually and generally need to be updated with each version of the middleware. They are centrally maintained by QWG maintainers. These templates are generic and take as input variables to define a specific site configuration. They should not (normally) be edited manually.
  • vo-legacy : templates to define VOs, using scheme used by previous version of QWG templates (until 2.7.0). They are provided for backward compatibility but should not be used anymore, after migrating to the new, much more flexible, scheme.
  • vo : templates to configure VOs. These templates and associated functions provide a flexible way of configuring VOs. They take as input the list of VOs to be configured, from variable VOS.

Site Customization of LCG2 Templates

Site customization to LCG templates is done through a small set of templates used to define variables used as input by QWG templates. This doesn't cover OS basic configurationt that is decribed in the page about template framework.

Site parameters

All site parameters related to LCG middleware are supposed to be declared in template pro_lcg2_config_site.tpl. A sensible default value is provided for all required variables in template source:template/trunk/grid/lcg-2.7.0/sources/pro_lcg2_config_system_defaults.tpl provided as part of QWG templates. This template must be included as part of the site pro_lcg2_config_site.tpl that must provide an explicit value for at least all the variable undef in template source:template/trunk/grid/lcg-2.7.0/sources/pro_lcg2_config_system_defaults.tpl.

Machine types

QWG templates provide a template per machine type (CE, SE, RB, ...). They are located in machine-types directory and are intended to be generic templates. No modification should be needed.

To configure a specific machine with LCG2 middleware, you just need to include the appropriate machine type template into the machine profile, after specifying a template containing the specific configuration for this particular machine with the variable xxx_CONFIG_SITE (look in the template for the exact name of the variable).

Here an example for configuring a Torque based CE :

object template profile_grid10;

# Define specific configuration for a GRIF CE to be added to
# standard configuration
variable CE_TORQUE_CONFIG_SITE = "pro_ce_torque_grif";

# Configure as a CE (Torque) + Site's BDII
include pro_ce_torque;

#
# software repositories (should be last)
#
include repository_common;

In this example, CE_TORQUE_CONFIG_SITE specify the name of a template defining the Torque configuration.

For DPM SE servers, there is an additional variable, SEDPM_SRM_SERVER, that must be defined to true on the DPM master node. Also, if you are not using a MySQL database, you need to define variable SEDPM_DB_TYPE to oracle.

All the machine types share a common basic configuration, described in template pro_lcg2_machine_config_site.tpl. This template allows to add site specificities to this common basic configuration (e.g. configuration of a monitoring agent). This is done by defining variable LCG2_BASE_CONFIG_SITE to a template containing the site specific configuration to be added to the common configuration (at the end of the common configuration). This variable can be defined, for example, in template pro_site_cluster_info.tpl.

VO Configuration

VO configuration consists to define variable VOS in pro_lcg2_config_site.tpl. This variable can also be redefined in the context of a specific node, if pro_lcg2_config_site.tpl defines VOS as a default value.

VO configuration is done by template vo/pro_vo_config.tpl. Behaviour of this template can be customized with variables. Main variables are (see the template for the full list) :

  • NODE_VO_LIST (list) : define the list of VO to initialize on current node. Default : VOS variable defined in pro_lcg2_config_site.
  • NODE_VO_POOLACCOUNTS (boolean) : pool account must be created for each VO initialized. Default : true.
  • NODE_VO_GRIDMAPDIR_CONFIG (boolean) : gridmapdir entries must be initialized for pool accounts. Default : NODE_VO_POOLACCOUNTS variable.
  • NODE_VO_SITE_CONFIG (string) : site specific template that must be included before actually doing VO intialization. Allow for specific VO modification to default VO configuration.Default : none.
  • NODE_VO_WLCONFIG (boolean) : initialize workload management environment for each VO. Normally enabled only on resource brokers. Default : false.
  • NODE_VO_CREATEHOME (boolean) : create home directories for pool accounts. Default : true.

Templates defining machine types define these variables to the value appriated for a given machine type and there should be normally no need to define these variables.

Adding a new VO to standard VOs require creating 2 templates. Use an existing VO, in vo directory, as a template.

Description of Main LCG2 Parameters

This section describes the main parameters (variables) used by QWG templates to configure the middleware. Most of these parameters are defined through variables in template defining site specific parameters for LCG2 middleware, pro_lcg2_config.site.tpl. Parameter description is organized by category of parameters, similarly to pro_lcg2_config_site_defaults.tpl organization.

NFS Configuration

NFS can be used to configure shared file systems between several LCG2 machines. This is mainly used to configure shared file systems between WNs and CE for home directories and/or software areas. But this can be used for any purpose and the NFS configuration description makes no assumption on the role of each machine.

2 templates are involved in NFS configuration but none of them should require any modification :

  • pro_lcg2_config_nfs_server.tpl : configure the NFS server side, including exporting the required file systems. By default, this is done on CE and Classic SE.
  • pro_lcg2_config_nfs_client.tpl : configure the NFS client side. By default this is done only on WNs.

Both templates do nothing if the current machine is neither a NFS client (WN), nor a NFS server (CE or SE).

Main variables used by these templates to configure NFS accordingly to local site configuration are :

  • WN_NFS_AREAS : this variable lists all file systems that need to be NFS mounted on NFS clients (WNs by default). This is a nlist where for each element, the key is the mount point on the client and the value the server to use. The server can be just a host name or hostname:/server_mnt_point in case the mount point is different on the server. The key value must be escaped. A typical example is :
    variable WN_NFS_AREAS = nlist(
      escape("/home"), CE_HOST,
      escape("/swmgrs"), CE_HOST+":/vo_sw_areas",
      escape(CE_CLOSE_SE_ACCESS_POINT), SE_HOST_DEFAULT,
    );
    
  • SITE_NFS_ACL : this is a list of hostname patterns to be used in the export entry for each file system listed in WN_NFS_AREAS. Default is to export all file systems to CE, SE, and WNs and should generably be appropriate.
  • NFS_THREADS : this is a nlist with one entry for each NFS server you want to define a non default value of NFS threads (8). An entry for an unused server is just ignored. The key must be the host name and the value the number of threads. A typical example is :
    variable NFS_THREADS = nlist(
      CE_HOST, 16,
      SE_HOST_DEFAULT, 16,
    );
    
  • WN_NFS_WL_SCRATCH : when definined to true, this variable prevents definition of EDG_WL_SCRATCH environment variable to a local directory when /home is NFS mounted. It is strongly advised to keep this variable to false, as having EDG_WL_SCRATCH on a NFS area with a large number of workers (50+ CPUs) can result in significant performance penalty on both WNs and NFS server.

For compatibility reason, if variable CE_NFS_ENABLED is defined, the default value for WN_NFS_AREAS is :

variable WN_NFS_AREAS = nlist(
  escape("/home"), CE_HOST,
);

This means that /home is NFS mounted on WNS and that the server is the CE.

PBS/Torque

PBS/Torque related templates support the following variables :

  • CE_QUEUES : a nlist with one entry per queue (key is the queue name). For each queue, the value itself is a nlist. One mandatory key is attr and defines the queue parameters (qmgr set queue options). Another optional key is vos and is used to explicitly define the VOs which have access to the queue (by default, only the VO with the same name as the queue has access). Look at pro_lcg2_config_site.tpl example for an example on how to define one queue for each supported VO.
  • CE_NFS_ENABLED : this variable must be set to true if WN home directories are on a shared NFS file system (even if the server is not the CE, the variable name is kept for backward compatibility). When set to true, PBS/Torque client is configured to redirect TMPDIR and EDG_WL_SCRATCH to a local directory on the WN.
  • WN_NFS_AREA : a nlist with one entry per file system that must be NFS mounted on worker nodes (key is the escaped file system mount point). Value for each entry is the name of the NFS server and optionaly the path on the NFS server if different from the path on the worker node.
  • WN_ATTRS : this variable is a nlist with one entry per worker node (key is the escaped node fullname). Each value is a set of PBS/Torque attribute to set on the node. Value value are any key=value supported by qmgr set server command. One useful value is status=offline to cause a specific node to drain or status=online to reenable the node. Just suppressing status=offline is not enough to reenable the node. One specific entry in WN_ATTRS is DEFAULT : this entry is applied to any node that doesn't have a specific entry.
  • WN_CPUS_DEF : default number of CPU per worker node.
  • WN_CPUS : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default.

MAUI

MAUI related templates support the following variables :

  • MAUI_CFG : the content of this variable must contain the full content of maui.cfg file. Look at pro_lcg2_config_site_maui.tpl example on how to define this variable from other configuration elements.
  • MAUI_WN_PART_DEF : default node partition to use with worker nodes
  • MAUI_WN_PART : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node.

CE

CE related templates support the following variables :

  • CE_STATUS : indicate the desired status of the CE. Can be Production, Queuing, Draining and Closed. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is Production. This variable can be used in conjunction to WN_ATTRS to drain queues and/or nodes. The meaning of each state is :
    • Production : this is the normal state. CE receives and processes jobs.
    • Draining : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them).
    • Closed : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete.
    • Queuing : CE accepts new jobs but will not execute them.
  • CE_BATCH_SYS : value indicates the type of LRMS (batch system) used on the CE. For all PBS and Torque/MAUI variants, the value must be pbs.
Last modified 18 years ago Last modified on Jul 29, 2006, 8:11:18 PM