wiki:Doc/gLite/WNCloning

WN Profile Cloning

Profile cloning is a technique to speed up profile compilation, taking advantage of nodes with an almost identical profile. The principle is that:

  • A reference node is defined.
  • On eligible nodes, the reference node compiled profile is included rather than recompiling the node profile. The only configuration parts not cloned from the reference node are hardware, network and file system configuration.

This results in a very significant speed up of the compilation: adding 100 clones typically double the compilation time of the reference node.

Configuration of Profile Cloning

Profile cloning is configured through a set of variables that must be defined very early in the configuration. The default template name to configure profile cloning is site/wn-cloning-config.tpl. Another template name can be used by defining variable PROFILE_CLONING_CONFIG_SITE. This template is typically located in site-specific templates or in cluster-specific templates.

Enabling Profile Cloning

Profile cloning is currently restricted to gLite WNs (since gLite 3.1). It is not enabled by default. Its use is controlled by 2 variables:

  • PROFILE_CLONING_ENABLED: this variable must be true in the node profile for profile cloning to be used. It is typically configured as true by default for all nodes, in site/wn-cloning-config.tpl or PROFILE_CLONING_CONFIG_SITE.
  • PROFILE_CLONING_DISABLED: this is a nlist with one entry per node for which you want to disable the use of profile cloning. The key must be the profile name of the node, without the profile prefix if defined. If the value is true, profile cloning will be disabled, even though PROFILE_CLONING_ENABLED is defined to true. Any other value will be ignored.

Defining the Reference Node

The reference node is defined with variable PROFILE_CLONING_REFERENCE_NODE. The value must match a profile template name (without the profile prefix if defined) in the same cluster as the nodes to be cloned. If you want to use the profile cloning in several clusters, you need to define one reference node per cluster.

Note: a current restriction is that only one reference node can be defined per cluster.

Defining the Eligible Nodes

Eligible nodes are defined by a regexp applied to the profile name (without the profile prefix if defined and without the extension). This regexp is defined with variable PROFILE_CLONING_ELIGIBLE_NODES. The regexp must use a Perl-compatible syntax.

Defining Profile Prefix

By default, the profile name is considered as matching the node name. If this is not the case, in particular if you use a prefix like profile_ for your profile names, you need to define it with the variable PROFILE_PREFIX.

Miscellaneous Configuration Options

Profile cloning is configured at a very early stage of the compilation process. As a result, it doesn't have access to the variables defined by the standard configuration process but needs to access directly some of specific site templates. If you don't use the standard names for these profiles, you should define the following variables (that are used by the standard configuration process and have the same defaults):

  • SITE_DB_TEMPLATE: name of the template defining variables DB_MACHINES and DB_IP. Default: site/databases.
  • SITE_GLOBAL_VARS_TEMPLATE: name of the template defining global variables for the site. Default: site/global_variables.
  • SITE_FUNCTIONS_TEMPLATE: name of the template defining site specific functions. Default: site/functions;
  • CLUSTER_INFO_TEMPLATE: name of the template defining the cluster-specific configuration. Default: site/cluster_info (or legacy non-namespaced named, pro_site_cluster_info).
  • SITE_CONFIG_TEMPLATE: name of the template defining the base site configuration. Default: site/config.
  • FILESYSTEM_CONFIG_SITE: name of the template configuring the file systems. Default: filesystem/config.
  • FILESYSTEM_LAYOUT_CONFIG_SITE: name of the template defining the file system layout, used by the standard file system configuration template. Default: site/filesystems/glite.

Note: if you use a non standard value for SITE_FUNCTIONS_TEMPLATE, CLUSTER_INFO_TEMPLATE or SITE_CONFIG_TEMPLATE, it is recommended to define these variables in SITE_GLOBAL_VARS_TEMPLATE to ensure consistency with standard configuration. This is not possible to define SITE_DB_TEMPLATE in this template because it is included before.

A typical definition of these variables for sites still using non-namespaced templates for these site-specific templates is:

variable SITE_DATABASES ?= 'pro_site_databases';
variable SITE_GLOBAL_VARS_TEMPLATE ?= 'pro_site_global_variables';
variable SITE_FUNCTIONS ?= 'pro_site_functions';
variable SITE_CONFIG ?= 'pro_site_config';

Cloning-Specific Node Configuration

Depending on your exact site configuration, you may need to execute some configuration actions only in the context of profile cloning. This section defines the different options available.

Cloning Postconfig

It is possible to execute a site-specific template at the very end of the cloning process. This template will be executed only when a node is cloned. By default there is none. To use one, define variable PROFILE_CLONING_CLONED_NODE_POSTCONFIG to this template name.

It is harmless to define this variable on a node which is not cloned.

Conditional Configuration

If a site-specific template is executed in the context of profile cloning (as part of the configuration replayed on every node), it may test the variable PROFILE_CLONING_CLONED_NODE to do (or ignore) some actions in the context of profile cloning. This variable has the value true when the profile is cloned rather than rebuilt.

Note: in an early version of profile cloning, this variable was called DUMMY_NODE. If you used this variable, you need to update it to the new one. There is no backward compatibility.

For example to conditionally exclude some template when the profile is cloned, you may use:

include { if ( PROFILE_CLONING_CLONED_NODE ) 'site/nocloning-config' };

Examples

The QWG templates contain a configuration example. It is based on 2 templates:

Troubleshooting

The important thing to remember when using profile cloning is that some configuration parameters for the node are inherited from the reference node rather than using the node-specific value if it exists. This is particularly true for OS version/architecture selection, kernel version... Also the packages part of the profiles are inherited from the reference nodes, including the OS errata.

To troubleshoot the decision to clone or not clone a profile, enable Pan compiler debugging features with the following command:

ant -Dpan.debug.include=glite/wn/cloning/selector

F.A.Q.

The reference node profile sometimes has an invalid content

You need to ensure you don't cross-reference you clusters in the cluster-specific file cluster.build.properties. This is strongly discouraged. When you use profile cloning, this may be the source of undesired side-effects: in particular, if one of the cluster doesn't have the cluster-specific configuration for the profile cloning whereas the other one has one (in particular the definition of a reference node), this may lead to the reference node being rebuilt in the context of the wrong cluster, impacting its OS versions, its gLite parameters...

If you suspect something wrong, use the troubleshooting procedure describe above to activate debug messages during the compilation.

two (or more) children have the same name in merge()

If you have some site-specific actions done after the include { 'machine-types/wn' } and involving npush operations (or an explicit call to function merge), you may have the following error during cloning:

two (or more) children have the same name in merge()

This may also happen with actions done in CLUSTER_INFO_TEMPLATE or SITE_CONFIG_TEMPLATE, which are reexecuted after cloning the reference profile.

This error results from the fact the entry you try to add to the nlist is already present in the reference profile and cannot be added twice. There are several workarounds possible to this problem:

  • Use WN_CONFIG_SITE (specific to WN) or GLITE_BASE_CONFIG_SITE (executed for all gLite machines) to execute a template containing all the site specific actions. This template will not be reexecuted when profile is cloned. This is the recommended and the most efficient method.
  • Conditionally execute site-specific action by testing variable PROFILE_CLONING_CLONED_NODE.
  • Use a direct assignment to the nlist rather than using npush to overwrite the entry in the reference node rather than add a new one. This is the least recommended method as it can be pretty inefficient. To do it, use Pan code like:
    variable MY_NLIST = {
      SELF['myentry'] = 'test';
      SELF;
    };
    

Kwown Issues

Only one reference node per cluster

In the current implementation, it is not possible to define several subset of nodes in the same cluster cloned from different reference nodes. Thus the reference node and the cloned nodes must support a similar configuration, in particular for the OS version/architecture. Either use PROFILE_CLONING_ELIGIBLE_NODES to define a regexp that matches only the relevant nodes or define an entry in PROFILE_CLONING_DISABLED for all the potentially eligible nodes that are not compatbile with the reference node or should not be cloned.

Package repository is not redone

During profile cloning, the package (RPM) repository configuration is not replayed. That means that if you add new packages, they will not be associated with a repository and their installation will fail at deployment time, without any error during the compilation.

This may be improve in a future release (#217). In the meantime, this is the site responsibility to ensure the package repository configuration (generally repository/config) is replayed on the cloned note. But be aware that doing this on all clonned nodes may significantly reduce the performance speed up provided by profile cloning. It is recommended, rather than adding packages and redoing the repository configuration, to add the missing packages to the reference node so that it is a superset of what is needed by all the cloned nodes.

Packages or templates provided by the OS not found

The cloned node inherit the RPM repository configuration from the reference node. Regarding packages providing by the OS, this means that they will be looked for in the RPM repositories associated with the OS version/architecture of the reference node, whatever is the OS version/architecture explicitly defined for the cloned node.

Even though OS version/architecture is inherited from the reference node, the OS version/architecture selection is replayed in the cloned node to allow to locate templates provided as part of the OS support. A restriction of the current implement is that the information used to configure the template loadpath is based on the cloned node configuration rather than the reference node configuration. If both don"t match, this may lead to errors like a template or a package that cannot be found. In particular, this may lead to loading a package with the wrong version for the OS version/architecture of the reference node. Note that because of the previous restriction, the incorrect package versions may not be reported.

To help with consistency between all nodes, it is recommended not to declare an explicit entry for the reference node and the other nodes cloned in the OS version database but rather rely on NODE_OS_VERSION_DEFAULT for all these nodes.

This should be improved in a future release (#217).

Last modified 14 years ago Last modified on Dec 22, 2009, 3:46:55 PM