= Layout and Customization of LCG2 Templates = [[TracNav]] [[TOC(inline)]] This page contains a description of the layout of templates for LCG2 provided by QWG and how the site customization should be integrated. See page on template [wiki:Doc/TemplateCustom framework] for more details on template framework structure and site customization. '''Note : QWG templates require pan compiler version 5 or later.''' == LCG2 Template Layout == QWG templates related to LCG2 middleware are organized in several directories for easier navigation into the template hierarchy. In the future these directories should be mapped to namespaces. Currently they are all equivalent from the pan perspective : '''a given template can be placed into any of the directory but should not exist in more than one directory as there is no guarantee about the search order'''. Directory used by QWG LCG2 templates are : * {{{machine-types}}} : templates defining the whole configuration of a given LCG2 machine type (e.g. WN, CE, SE, BDII...). There is one template per machine type. They all rely on template {{{pro_lcg2_machine_config_base}}} to define basic configuration of any kind of LCG2 machine. * {{{rpmlist}}} : templates defining RPMs that must be loaded for a given LCG2 service. These RPMs are generated from LCG2 middleware description and '''should not''' be edited manually (manual edits will be lost next time templates are generated). * {{{repository}}} : there is normally one template listing all the RPM repositories associated with the current version of the middleware. Each repository is defined in a template that is site specific. * {{{sources}}} : templates that define a LCG2 service configuration. These templates are maintained manually and generally need to be updated with each version of the middleware. They are centrally maintained by QWG maintainers. These templates are generic and take as input variables to define a specific site configuration. They '''should not''' (normally) be edited manually. * {{{vo-legacy}}} : templates to define VOs, using scheme used by previous version of QWG templates (until 2.7.0). They are provided for backward compatibility but should not be used anymore, after migrating to the new, much more flexible, scheme. * {{{vo}}} : templates to configure VOs. These templates and associated functions provide a flexible way of configuring VOs. They take as input the list of VOs to be configured, from variable VOS. == Site Customization of LCG2 Templates == Site customization to LCG templates is done through a small set of templates used to define variables used as input by QWG templates. This doesn't cover OS basic configurationt that is decribed in the page about [wiki:Doc/TemplateCustom template framework]. === Site parameters === All site parameters related to LCG middleware are supposed to be declared in template {{{pro_lcg2_config_site.tpl}}}. A sensible default value is provided for all required variables in template source:template/trunk/grid/lcg-2.7.0/sources/pro_lcg2_config_system_defaults.tpl provided as part of QWG templates. This template must be included as part of the site {{{pro_lcg2_config_site.tpl}}} that must provide an explicit value for at least all the variable {{{undef}}} in template source:template/trunk/grid/lcg-2.7.0/sources/pro_lcg2_config_system_defaults.tpl. === Machine types === QWG templates provide a template per machine type (CE, SE, RB, ...). They are located in {{{machine-types}}} directory and are intended to be generic templates. No modification should be needed. To configure a specific machine with LCG2 middleware, you just need to include the appropriate machine type template into the machine profile, after specifying a template containing the specific configuration for this particular machine with the variable {{{xxx_CONFIG_SITE}}} (look in the template for the exact name of the variable). Here an example for configuring a Torque based CE : {{{ object template profile_grid10; # Define specific configuration for a GRIF CE to be added to # standard configuration variable CE_TORQUE_CONFIG_SITE = "pro_ce_torque_grif"; # Configure as a CE (Torque) + Site's BDII include pro_ce_torque; # # software repositories (should be last) # include repository_common; }}} In this example, {{{CE_TORQUE_CONFIG_SITE}}} specify the name of a template defining the Torque configuration. For DPM SE servers, there is an additional variable, {{{SEDPM_SRM_SERVER}}}, that must be defined to {{{true}}} on the DPM master node. Also, if you are not using a MySQL database, you need to define variable {{{SEDPM_DB_TYPE}}} to {{{oracle}}}. All the machine types share a common basic configuration, described in template {{{pro_lcg2_machine_config_site.tpl}}}. This template allows to add site specificities to this common basic configuration (e.g. configuration of a monitoring agent). This is done by defining variable {{{LCG2_BASE_CONFIG_SITE}}} to a template containing the site specific configuration to be added to the common configuration (at the end of the common configuration). This variable can be defined, for example, in template {{{pro_site_cluster_info.tpl}}}. === VO Configuration === VO configuration consists to define variable {{{VOS}}} in {{{pro_lcg2_config_site.tpl}}}. This variable can also be redefined in the context of a specific node, if {{{pro_lcg2_config_site.tpl}}} defines {{{VOS}}} as a default value. VO configuration is done by template {{{vo/pro_vo_config.tpl}}}. Behaviour of this template can be customized with variables. Main variables are (see the template for the full list) : * NODE_VO_LIST (list) : define the list of VO to initialize on current node. Default : VOS variable defined in {{{pro_lcg2_config_site}}}. * NODE_VO_POOLACCOUNTS (boolean) : pool account must be created for each VO initialized. Default : true. * NODE_VO_GRIDMAPDIR_CONFIG (boolean) : gridmapdir entries must be initialized for pool accounts. Default : NODE_VO_POOLACCOUNTS variable. * NODE_VO_SITE_CONFIG (string) : site specific template that must be included before actually doing VO intialization. Allow for specific VO modification to default VO configuration.Default : none. * NODE_VO_WLCONFIG (boolean) : initialize workload management environment for each VO. Normally enabled only on resource brokers. Default : false. * NODE_VO_CREATEHOME (boolean) : create home directories for pool accounts. Default : true. Templates defining machine types define these variables to the value appriated for a given machine type and there should be normally no need to define these variables. Adding a new VO to standard VOs require creating 2 templates. Use an existing VO, in {{{vo}}} directory, as a template. == Description of Main LCG2 Parameters == This section describes the main parameters (variables) used by QWG templates to configure the middleware. Most of these parameters are defined through variables in template defining site specific parameters for LCG2 middleware, {{{pro_lcg2_config.site.tpl}}}. Parameter description is organized by category of parameters, similarly to {{{pro_lcg2_config_site_defaults.tpl}}} organization. === NFS Configuration === NFS can be used to configure shared file systems between several LCG2 machines. This is mainly used to configure shared file systems between WNs and CE for home directories and/or software areas. But this can be used for any purpose and the NFS configuration description makes no assumption on the role of each machine. 2 templates are involved in NFS configuration but none of them should require any modification : * {{{pro_lcg2_config_nfs_server.tpl}}} : configure the NFS server side, including exporting the required file systems. By default, this is done on CE and Classic SE. * {{{pro_lcg2_config_nfs_client.tpl}}} : configure the NFS client side. By default this is done only on WNs. Both templates do nothing if the current machine is neither a NFS client (WN), nor a NFS server (CE or SE). Main variables used by these templates to configure NFS accordingly to local site configuration are : * {{{WN_NFS_AREAS}}} : this variable lists all file systems that need to be NFS mounted on NFS clients (WNs by default). This is a nlist where for each element, the key is the mount point on the client and the value the server to use. The server can be just a host name or {{{hostname:/server_mnt_point}}} in case the mount point is different on the server. The key value must be escaped. A typical example is : {{{ variable WN_NFS_AREAS = nlist( escape("/home"), CE_HOST, escape("/swmgrs"), CE_HOST+":/vo_sw_areas", escape(CE_CLOSE_SE_ACCESS_POINT), SE_HOST_DEFAULT, ); }}} * {{{SITE_NFS_ACL}}} : this is a list of hostname patterns to be used in the export entry for each file system listed in {{{WN_NFS_AREAS}}}. Default is to export all file systems to CE, SE, and WNs and should generably be appropriate. * {{{NFS_THREADS}}} : this is a nlist with one entry for each NFS server you want to define a non default value of NFS threads (8). An entry for an unused server is just ignored. The key must be the host name and the value the number of threads. A typical example is : {{{ variable NFS_THREADS = nlist( CE_HOST, 16, SE_HOST_DEFAULT, 16, ); }}} * `WN_NFS_WL_SCRATCH` : when definined to true, this variable prevents definition of EDG_WL_SCRATCH environment variable to a local directory when /home is NFS mounted. It is strongly advised to keep this variable to false, as having EDG_WL_SCRATCH on a NFS area with a large number of workers (50+ CPUs) can result in significant performance penalty on both WNs and NFS server. For compatibility reason, if variable {{{CE_NFS_ENABLED}}} is defined, the default value for {{{WN_NFS_AREAS}}} is : {{{ variable WN_NFS_AREAS = nlist( escape("/home"), CE_HOST, ); }}} This means that {{{/home}}} is NFS mounted on WNS and that the server is the CE. === PBS/Torque === PBS/Torque related templates support the following variables : * `CE_QUEUES` : a nlist with one entry per queue (key is the queue name). For each queue, the value itself is a nlist. One mandatory key is `attr` and defines the queue parameters (`qmgr set queue` options). Another optional key is `vos` and is used to explicitly define the VOs which have access to the queue (by default, only the VO with the same name as the queue has access). Look at [source:templates/trunk/grid/lcg-2.7.0/site/pro_lcg2_config_site.tpl pro_lcg2_config_site.tpl] example for an example on how to define one queue for each supported VO. * `CE_NFS_ENABLED` : this variable must be set to true if WN home directories are on a shared NFS file system (even if the server is not the CE, the variable name is kept for backward compatibility). When set to true, PBS/Torque client is configured to redirect {{{TMPDIR}}} and {{{EDG_WL_SCRATCH}}} to a local directory on the WN. * `WN_NFS_AREA` : a nlist with one entry per file system that must be NFS mounted on worker nodes (key is the escaped file system mount point). Value for each entry is the name of the NFS server and optionaly the path on the NFS server if different from the path on the worker node. * `WN_ATTRS` : this variable is a nlist with one entry per worker node (key is the escaped node fullname). Each value is a set of PBS/Torque attribute to set on the node. Value value are any `key=value` supported by `qmgr set server` command. One useful value is `status=offline` to cause a specific node to drain or `status=online` to reenable the node. Just suppressing `status=offline` is not enough to reenable the node. One specific entry in `WN_ATTRS` is `DEFAULT` : this entry is applied to any node that doesn't have a specific entry. * `WN_CPUS_DEF` : default number of CPU per worker node. * `WN_CPUS` : a nlist with one entry per worker node (key is the node fullname) having a number of CPUs different from the default. === MAUI === MAUI related templates support the following variables : * `MAUI_CFG` : the content of this variable must contain the full content of `maui.cfg` file. Look at [source:templates/trunk/grid/lcg-2.7.0/site/pro_lcg2_config_site_maui.tpl pro_lcg2_config_site_maui.tpl] example on how to define this variable from other configuration elements. * `MAUI_WN_PART_DEF` : default node partition to use with worker nodes * `MAUI_WN_PART` : a nlist with one entry per worker node (key is node fullname). The value is the name of the MAUI partition where to place the specific worker node. === CE === CE related templates support the following variables : * `CE_STATUS` : indicate the desired status of the CE. Can be `Production`, `Queuing`, `Draining` and `Closed`. All the necessary actions are taken to set the CE in the requested status. Default status (if variable is not specified) is `Production`. This variable can be used in conjunction to [wiki:Doc/LCG2/TemplateLayout#PBSTorque WN_ATTRS] to drain queues and/or nodes. The meaning of each state is : * `Production` : this is the normal state. CE receives and processes jobs. * `Draining` : CE doesn't accept new jobs but continues to execute jobs queued (as long as they are WNs available to execute them). * `Closed` : CE doesn't accept new jobs and jobs already queued are not executed. Only running jobs can complete. * `Queuing` : CE accepts new jobs but will not execute them. * `CE_BATCH_SYS` : value indicates the type of LRMS (batch system) used on the CE. For all PBS and Torque/MAUI variants, the value must be `pbs`.