= WN Profile Cloning =
[[TracNav]]

[[TOC(inline)]]


''Profile cloning'' is a technique to speed up profile compilation, taking advantage of nodes with an almost identical profile. The principle is that:
 * A ''reference node'' is defined.
 * On eligible nodes, the ''reference node'' compiled profile is included rather than recompiling the node profile. The only configuration parts not cloned from the ''reference node'' are hardware, network and file system configuration.
 
This results in a very significant speed up of the compilation: adding 100 clones typically double the compilation time of the reference node.

== Configuration of Profile Cloning ==

Profile cloning is configured through a set of variables that must be defined '''very early''' in the configuration. The default template name to configure profile cloning is `site/wn-cloning-config.tpl`. Another template name can be used by defining variable `PROFILE_CLONING_CONFIG_SITE`. This template is typically located in  [source:templates/trunk/sites/example/site/wn-cloning-config.tpl site-specific] templates or in [source:templates/trunk/clusters/example-3.2/site cluster-specific] templates.

=== Enabling Profile Cloning ===

Profile cloning is currently restricted to gLite WNs (since gLite 3.1). It is not enabled by default. Its use is controlled by 2 variables:
 * `PROFILE_CLONING_ENABLED`: this variable must be `true` in the node profile for profile cloning to be used. It is typically configured as `true` by default for all nodes, in `site/wn-cloning-config.tpl` or `PROFILE_CLONING_CONFIG_SITE`.
 * `PROFILE_CLONING_DISABLED`: this is a nlist with one entry per node for which you want to disable the use of profile cloning. The key must be the profile name of the node, without the [#ProfilePrefix profile prefix] if defined. If the value is `true`, profile cloning will be disabled, even though `PROFILE_CLONING_ENABLED` is defined to true. Any other value will be ignored.

=== Defining the Reference Node ===

The ''reference node'' is defined with variable `PROFILE_CLONING_REFERENCE_NODE`. The value must match a profile template name (without the [#ProfilePrefix profile prefix] if defined) '''in the same cluster''' as the nodes to be cloned. If you want to use the profile cloning in several clusters, you need to define one reference node per cluster.

''Note: a current restriction is that only one reference node can be defined per cluster.''

=== Defining the Eligible Nodes ===

Eligible nodes are defined by a regexp applied to the profile name (without the [#ProfilePrefix profile prefix] if defined and without the extension). This regexp is defined with variable `PROFILE_CLONING_ELIGIBLE_NODES`. The regexp must use a Perl-compatible syntax.

=== Defining Profile Prefix === #ProfilePrefix

By default, the profile name is considered as matching the node name. If this is not the case, in particular if you use a prefix like `profile_` for your profile names, you need to define it with the variable `PROFILE_PREFIX`.

=== Miscellaneous Configuration Options === #MiscConfOptions

Profile cloning is configured at a very early stage of the compilation process. As a result, it doesn't have access to the variables defined by the standard configuration process but needs to access directly some of specific site templates. If you don't use the standard names for these profiles, you should define the following variables (that are used by the standard configuration process and have the same defaults):
 * `SITE_DB_TEMPLATE`: name of the template defining variables `DB_MACHINES` and `DB_IP`. Default: `site/databases`.
 * `SITE_GLOBAL_VARS_TEMPLATE`: name of the template defining global variables for the site. Default: `site/global_variables`.
 * `SITE_FUNCTIONS_TEMPLATE`: name of the template defining site specific functions. Default: `site/functions`;
 * `CLUSTER_INFO_TEMPLATE`: name of the template defining the cluster-specific configuration. Default: `site/cluster_info` (or legacy non-namespaced named, `pro_site_cluster_info`).
 * `SITE_CONFIG_TEMPLATE`: name of the template defining the base site configuration. Default: `site/config`.
 * `FILESYSTEM_CONFIG_SITE`: name of the template configuring the file systems. Default: `filesystem/config`.
 *  `FILESYSTEM_LAYOUT_CONFIG_SITE`: name of the template defining the file system layout, used by the standard file system configuration template. Default: `site/filesystems/glite`.
 
''Note: if you use a non standard value for `SITE_FUNCTIONS_TEMPLATE`, `CLUSTER_INFO_TEMPLATE` or `SITE_CONFIG_TEMPLATE`, it is recommended to define these variables in `SITE_GLOBAL_VARS_TEMPLATE` to ensure consistency with standard configuration. This is not possible to define `SITE_DB_TEMPLATE` in this template because it is included before.''

A typical definition of these variables for sites still using non-namespaced templates for these site-specific templates is:
{{{
variable SITE_DATABASES ?= 'pro_site_databases';
variable SITE_GLOBAL_VARS_TEMPLATE ?= 'pro_site_global_variables';
variable SITE_FUNCTIONS ?= 'pro_site_functions';
variable SITE_CONFIG ?= 'pro_site_config';
}}}

=== Cloning-Specific Node Configuration ===

Depending on your exact site configuration, you may need to execute some configuration actions only in the context of profile cloning. This section defines the different options available.

=== Cloning Postconfig ===

It is possible to execute a site-specific template at the very end of the cloning process. This template will be executed only when a node is cloned. By default there is none. To use one, define variable `PROFILE_CLONING_CLONED_NODE_POSTCONFIG` to this template name.

It is harmless to define this variable on a node which is not cloned.

=== Conditional Configuration === #ConditionalActions

If a site-specific template is executed in the context of profile cloning (as part of the configuration replayed on every node), it may test the variable `PROFILE_CLONING_CLONED_NODE` to do (or ignore) some actions in the context of profile cloning. This variable has the value `true` when the profile is cloned rather than rebuilt.

''Note: in an early version of profile cloning, this variable was called DUMMY_NODE. If you used this variable, you need to update it to the new one. There is no backward compatibility.''

For example to conditionally exclude some template when the profile is cloned, you may use:
{{{
include { if ( PROFILE_CLONING_CLONED_NODE ) 'site/nocloning-config' };
}}}
== Examples ==

The QWG templates contain a configuration example. It is based on 2 templates:
 * [source:templates/trunk/sites/example/site/wn-cloning-config.tpl sites/example/site/wn-cloning-config.tpl]: a site-wide template defining the default parameters for profile cloning, except the ''reference node''.
 * A cluster specific template included by [source:templates/trunk/sites/example/site/wn-cloning-config.tpl sites/example/site/wn-cloning-config.tpl]: this template defines the cluster-specific ''reference node''. Look at [source:templates/trunk/clusters/example-3.2/site/wn-cloning-cluster-config.tpl clusters/example-3.2/site/wn-cloning-cluster-config.tpl] for an example.
 
 
== Troubleshooting ==

The important thing to remember when using profile cloning is that some configuration parameters for the node are inherited from the reference node rather than using the node-specific value if it exists. This is particularly true for OS version/architecture selection, kernel version... Also the packages part of the profiles are inherited from the reference nodes, including the OS errata.

To troubleshoot the decision to clone or not clone a profile, enable Pan compiler [/wiki/Doc/SCDB/Usage#Usingpancdebugging debugging features] with the following command:
{{{
ant -Dpan.debug.include=glite/wn/cloning/selector
}}}

== F.A.Q. ==

=== The reference node profile sometimes has an invalid content ===

You need to ensure you don't cross-reference you clusters in the cluster-specific file `cluster.build.properties`. '''This is strongly discouraged'''. When you use profile cloning, this may be the source of undesired side-effects: in particular, if one of the cluster doesn't have the cluster-specific configuration for the profile cloning whereas the other one has one (in particular the definition of a reference node), this may lead to the reference node being rebuilt in the context of the wrong cluster, impacting its OS versions, its gLite parameters...

If you suspect something wrong, use the [#Troubleshooting troubleshooting] procedure describe above to activate debug messages during the compilation.

=== two (or more) children have the same name in merge() ===

If you have some site-specific actions done after the `include { 'machine-types/wn' }` and involving `npush` operations (or an explicit call to function `merge`), you may have the following error during cloning:
{{{
two (or more) children have the same name in merge()
}}}

This may also happen with actions done in [#MiscConfOptions CLUSTER_INFO_TEMPLATE] or [#MiscConfOptions SITE_CONFIG_TEMPLATE], which are reexecuted after cloning the reference profile.

This error results from the fact the entry you try to add to the nlist is already present in the reference profile and cannot be added twice. There are several workarounds possible to this problem:
 * Use `WN_CONFIG_SITE` (specific to WN) or `GLITE_BASE_CONFIG_SITE` (executed for all gLite machines) to execute a template containing all the site specific actions. This template will not be reexecuted when profile is cloned. This is the recommended and the most efficient method.
 * Conditionally execute site-specific action by testing variable [#ConditionalActions PROFILE_CLONING_CLONED_NODE].
 * Use a direct assignment to the nlist rather than using `npush` to overwrite the entry in the reference node rather than add a new one. This is the least recommended method as it can be pretty inefficient. To do it, use Pan code like:
{{{
variable MY_NLIST = {
  SELF['myentry'] = 'test';
  SELF;
};
}}}

== Kwown Issues ==

=== Only one reference node per cluster ===

In the current implementation, it is not possible to define several subset of nodes in the same cluster cloned from different reference nodes. Thus the reference node and the cloned nodes must support a similar configuration, in particular for the OS version/architecture. Either use `PROFILE_CLONING_ELIGIBLE_NODES` to define a regexp that matches only the relevant nodes or define an entry in `PROFILE_CLONING_DISABLED` for all the potentially eligible nodes that are not compatbile with the reference node or should not be cloned.

=== Package repository is not redone ===

During profile cloning, the package (RPM) repository configuration is not replayed. That means that if you add new packages, they will not be associated with a repository and their installation will fail at deployment time, without any error during the compilation.  

This may be improve in a future release (#217). In the meantime, this is the site responsibility to ensure the package repository configuration (generally `repository/config`) is replayed on the cloned note. But be aware that doing this on all clonned nodes '''may significantly reduce''' the performance speed up provided by profile cloning. It is recommended, rather than adding packages and redoing the repository configuration, to add the missing packages to the reference node so that it is a superset of what is needed by all the cloned nodes.

=== Packages or templates provided by the OS not found ===

The cloned node inherit the RPM repository configuration from the reference node. Regarding packages providing by the OS, this means that they will be looked for in the RPM repositories associated with the OS version/architecture of the reference node, whatever is the OS version/architecture explicitly defined for the cloned node.

Even though OS version/architecture is inherited from the reference node, the OS version/architecture selection is replayed in the cloned node to allow to locate templates provided as part of the OS support. A restriction of the current implement is that the information used to configure the ''template loadpath'' is based on the cloned node configuration rather than the reference node configuration. If both don"t match, this may lead to errors like a template or a package that cannot be found. In particular, this may lead to loading a package with the wrong version for the OS version/architecture of the reference node. Note that because of the previous restriction, the incorrect package versions may not be reported.

To help with consistency between all nodes, it is recommended not to declare an explicit entry for the reference node and the other nodes cloned in the OS version database but rather rely on `NODE_OS_VERSION_DEFAULT` for all these nodes.

This should be improved in a future release (#217).