Version 17 (modified by /O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Michel Jouvin, 13 years ago) (diff)


OS Errata Management and Deployment

Quattor can greatly help with OS errata deployment. QWG templates has a few specific features and tools to help managing them.

Downloading RPM Errata

RPM errata should normally be downloaded from the official public servers for the distribution you use. With Scientitific Linux it is possible to use rsync to syncrhonize a local directory with the official repository. The rsync official URL prefix is:

  • For SL4:
  • For SL5:

GRIF tends to have a rather up-to-date mirror of these repositories that you can access with http at:

These are just suggestions, there are many way to synchronize with a reference repository, including YUM and a script provided by SCDB, utils/misc/sync-os-errata, to do it in a cron job using rsync or wget.

Generating Templates for OS Errata

After downloading the RPM errata, it is necessary to generate a template that will be used for deploying them. This is done with SCDB script utils/misc/ This script accept one argument which is the name of the local directory containing the errata and configured as a SCDB RPM repository. It will produce on stdout a template with a pkg_ronly entry for the last version of all RPMs found in the directory passed to the script. The output must be redirected to a template.

For example, assuming your RPM errata for SL 4.7 x86_64 are located in directory /www/htdocs/packages/os/sl470-x86_64/errata, the command would be:

utils/misc/ /www/htdocs/packages/os/sl470-x86_64/errata/ > cfg/os/sl470-x84_64/rpms/errata/20090826.tpl

Note: is very verbose. All the information messages are sent to stderr and can be redirected separatly.

`In the resulting template, due to the specificities of upgrading kernels, kernel entries are are not added. See later for kernel upgrade specificities.

As the template use pkg_ronly() SPMA function, the errata will be included in the configuration only if another version of the same package and architecture is already part of the configuration.

Note: normally, the template generated can be used as it is without any manual edition. Because pkg_ronly only replaces a RPM already part of the configuration, this may not work in the very rare cases where a RPM is renamed. In this case, you need to manually update the template to replace pkg_ronly by pkg_repl (same arguments) and add a line for the old package name (only argument) with pkg_del to remove the old package. This is also necessary for kernel modules where the kernel version is part of the RPM name for the module.

Deploying OS Errata

Errata deployment is controlled through variable PKG_DEPLOY_OS_ERRATA. By default, to avoid any problem at a site, errata deployment is disabled. But sites are strongly encouraged to define this variable to true to enable errata deployment. The most usual places to define this variables are site/cluster-info.tpl to control it at a cluster level or in site/config.tpl in your site-specific templates to control it at the site level. It is recommended to define a default value (using operator ?= ) to allow further redefinition in a node profile.

When enabled, OS errata deployement uses by default template rpms/errata.tpl (or whatever the variable PKG_OS_ERRATA_TEMPLATE_DEFAULT refers to) in OS templates to find out the actions to do. This template is normally empty or non existent. It is the site responsability to produce the template with all the package replacements and other configuraction actions required to deploy the errata (QWG templates contains some errata related templates for major vulnerabilities). See previous section. The templates produced are generally put in rpms/errata in OS templates. The name is at the site discretion but it is recommended to choose a name reflecting the errata data for easier tracking and support of multiple errata versions.

The actual errata version deployed on a given node is controlled by a two variables:

  • OS_ERRATA_TEMPLATE: this nlist defines the default OS errata version to use for a given OS version. The key is usual version+architecture combination (e.g. sl470-x86_64), the value is the name of the errata template to execute (the same name you would use in an include statement to execute the same template). This variable is typically defined in a site specific template that must be included by sites/example/site/os/version_db.tpl: look at sites/example/site/os/errata-defaults.tpl for an example.
  • NODE_OS_ERRATA_TEMPLATE: this nlist defines the OS errata version to use for a specific node. The key is escaped node name, the value is the name of the errata template to execute (the same name you would use in an include statement to execute the same template). If a given node has an entry matching its name in this variable, this overrides the default defined for the OS. This variable is typically defined in sites/example/site/os/version_db.tpl or another template it includes.

Note: direct definition of variable PKG_OS_ERRATA_TEMPLATE to control the errata version to deploy is discouraged, even though it used to be the standard practice. Use the method describe above instead.

In addition to the base template to deploy and configure errata (normally the result of script), 2 other templates can be used as part of the errata deployment process:

  • An errata init template whose name is derived from the base template name, with a suffix added. The default suffix is -init and can be modified with variable PKG_OS_ERRATA_INIT_TEMPLATE_SUFFIX. This template is optional and is executed very early in the configuration. It must be located in config/os/errata in the OS templates. It is typically used to define the kernel version associated with the errata. See next section.
  • An errata fix template whose name is derived from the base template name, with a suffix added. The default suffix is -fix and can be modified with variable PKG_OS_ERRATA_FIX_TEMPLATE_SUFFIX. This template is optional and is reexecuted every time the OS errata are applied. As for the base template, it should not be tagged as unique. It must be located in the same directory as the base template. It is used to do actions other than the package replacements done in the base template. This is in particular used to delete conflicting packages.

Resolving conflicts due to errata

It sometimes happens that one errata RPM causes conflicts or have dependencies difficult to solve in the site context. If this RPM is not used or not critical for the site/node, it is possible to remove it by adding to the errata template something like:

'/software/packages' = pkg_del('myconflictingrpm');

Note: never try to remove the RPM from the base templates used to configure the OS. First it may break some things when the errata are not deployed. Also one specific RPM is often added by several templates. But the main reason is that these templates are entirely generated from distribution official list. You obtain the same result with the line described above, except its effect is only in the context of these errata.

A good place to add this type of modifications to the base template for the errata, is the errata fix template described above.

Kernel errata

Handling of kernel errata is a bit specific due to some restrictions in the current version of SPMA and because an improper upgrade may lead to a machine not restartable.

The kernel version selection, for the kernel itself and all the kernel modules you may use is done using the standard kernel selection method. In addition, this is possible to define a default kernel version to use with one specific errata version. This is done with variable OS_KERNEL_VERSION_ERRATA which is a nlist with the same structure as OS_KERNEL_VERSION: the key is the OS version or version+architecture combination, the value is the base template to use to configure the errata. This variable is typically defined in the errata init template described above with one entry for the OS version configured. When an entry is defined in this variable, it supersedes the default entry for the OS version (without errata). Site-specific values defined in OS_KERNEL_VERSION take precedence in any case.

With the current version of SPMA it is not possible to tell SPMA to never uninstall a kernel, even if it is no longer part of the configuration. As a result if you just replace the kernel, the one actually used will be removed at the same time the new one is installed and in case of a problem you may not be able to reboot. A workaround is to add the following lines at the end of the node profile, before the repository configuration), or in any template as part of the errata configuration if you want to avoid editing a large number of profiles (a good place may be the errata fix template, see above). The lines to add are:

'/software/components/spma/userpkgs' = 'yes';
'/software/packages' = pkg_add(PKG_KERNEL_RPM_NAME,'old-kernel-version',PKG_ARCH_KERNEL,"multi");

with old-kernel-version replaced by the kernel RPM version currently installed.

Note: for the kernel, pkg_add must be used with option multi to enable the concurrent installation of several kernel version.

Errata and profile cloning

Profile cloning, also know, as dummy WN, is a technique that allows to configure a reference profile and then reuse it for multiple nodes, with a few customizations (it is currently used only for gLite WN, thus its name). The other nodes clone the reference profile rather than executing the configuration. As a result, any modification to the reference profile is also applied to all other nodes that clone it.

When deploying OS errata, it is not always convenient to deploy them on all nodes at once. In this case, during the errata deployment, you need to temporarily disable profile cloning on the nodes you don't want to upgrade at the same time as the reference profile. This can be achieved without modifying the profiles themselves, by defining variable PROFILE_CLONING_DISABLED which is an nlist where:

  • The key is the string following PROFILE_PREFIX in the profile template name.
  • The value must be true in order to desactive profile cloning.

PROFILE_CLONING_DISABLED must be defined very early in the configuration. This is typically done at the same place where you configure profile cloning. See profile cloning documentation for more information.