wiki:ReleaseNotes/gLite-3.2

Release Notes for gLite 3.2 Templates

This page contains information about each release of QWG Templates for gLite 3.2, in particular new or changes features and known problems. To know how to configure the template, refer to the page on gLite templates customization.

QWG Releases

Note : you can have a look at ongoing developments and progress of upcoming release in the roadmap, last entries in the gLite-3.2 branch ChangeLog or full log of trunk branch.'

Date Release Description
19/4/2009 Creation of branch gLite-3.1
10/7/2009 gLite-3.2.0-1 WN, UI, BDII
18/10/2009 gLite-3.2.0-2 FroNTieR, NFS, Torque server, gLite 3.2 update 04 and 05
09/08/2010 gLite-3.2.0-3 gLite updates up to 15, CA 1.31-36, CREAM CE, VOBOX, DPM, LFC OS errata
09/02/2011 gLite-3.2.0-4 gLite updates up to 21, CA 1.37, ARGUS, DPM/LFC 1.8, OS errata

gLite Updates

QWG templates releases deliver the last gLite updates available at the time of the release. There is no match between QWG release number (-n) and gLite update numbers. Sometimes one QWG templates release deliver several gLite updates. In each QWG release, there is a default associated gLite update (generally the last one).

QWG releases provide a standard mechanism for selecting the gLite update you want to deploy on a per node, per cluster or per site basis.

Content of gLite updates and associated release notes can be viewed at http://glite.web.cern.ch/glite/packages/R3.2/updates.asp.

Upgrading from gLite 3.1

QWG releases for gLite 3.2 supports updating existing gLite 3.1 nodes, managed by QWG releases gLite-3.1.0 series. The only restriction is that gLite 3.2 is supported only on 64-bit machines, thus if gLite 3.1 nodes running a 32-bit OS must be reinstalled.

Note: despite gLite 3.1 was mainly 32-bit, all the gLite 3.1 machine types and services in QWG templates were supported on 64-bit Linux.

gLite version cannot be mixed inside the same SCDB cluster. Updating existing node involves :

  • Creating a new cluster using the following steps :
    • Copy the existing cluster with svn cp
    • Edit cluster.build.properties of the new cluster to reflect the use of gLite 3.2 templates intead of gLite 3.1
    • Review other cluster specific templates, in particular site/cluster_info.tpl, for any change that could be necessary (none required).
  • Moving node profiles from one cluster to the other with svn mv command.

Note : there is no problem to have nodes part of the same grid configuration (e.g. CE and WNs) belonging to 2 different clusters. It is recommended to put site specific gLite parameters, site/glite/config.tpl, in your site directory.

Known Problems of Releases

This section describes severe issues that have been discovered after a release of QWG templates. They are generally fixed in the release preview of next releases, available in gLite 3.2 branch, which is the most up-to-date version recommended for use in production.

The version mentioned in each title is the release having the problem.

gLite-3.2.0-4: gLite 3.2 update 21 contains a major DPMLFC upgrade (1.8)

gLite 3.2 update 21, delivered as part of version 3.2.0-4 of QWG templates, brings DPM and LFC 1.8 which are major DPM and LFC upgrades requiring a database schema change. The schema change is not deployed by QWG templates. The schema upgrade script must be run manually with the following commands:

# For LFC, the script is in /opt/lcg/share/LFC/cns-db-300-to-310
cd /opt/lcg/share/DPM/cns-db-300-to-310
# First check the current schema version (in case of failure, see comment below)
./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db --show-version-only
# If less than `3 1 0`, upgrade the schema
./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db

Note: the MySQL password must be stored in the file passed as option --pwd-file before running the command. It is harmless to run the script twice.

Note that the upgraded database is still compatible with DPM/LFC 1.7. This allows to upgrade the schema before the DPM/LFC upgrade and to roll back to 1.7 in case of severe issues with 1.8 (none known). The schema upgrade doesn't require to stop DPM/LFC. To apply the schema upgrade before updating DPM/LFC, you need to retrieve the upgrade script (directory) from the appropriate 1.8 RPM:

  • DPM : DPM-name-server-mysql
  • LFC : DPM-name-server-mysql

Also note that it may happen (both for DPM and LFC) that the column names indicating the schema version may not match what is expected. In this case, running the script with --show-version-only results in a failure to query the version. To fix it, connect to the MySQL database and enter the following commands:

use cns_db;
ALTER TABLE schema_version CHANGE major_number major INTEGER(11);
ALTER TABLE schema_version CHANGE minor_number minor INTEGER(11);
ALTER TABLE schema_version CHANGE patch_number patch INTEGER(11);

Then rerun the upgrade script.

gLite-3.2.0-3: CREAM 1.6 upgrade requires manual steps

Version 3.2.0-3 of QWG templates delivers new CREAM CE version 1.6 (released as part of gLite update 12). This is a major update of CREAM CE. When updating an existing CREAM CE (delivered as part of an early version of QWG templates 3.2.0-3), the following manual steps must be done after deploying the new version to restore the service:

  1. Stop the CREAM CE:
    service tomcat5 stop
    
  2. Delete contents of /usr/share/tomcat5/webapps directory. This is a Tomcat requirement and if this is not done, Tomcat will continue to use the old CREAM CE version:
    rm -Rf /usr/share/tomcat5/webapps/*
    
  3. Delete the existing Tomcat application configurations:
    rm  /usr/share/tomcat5/conf/Catalina/localhost/*
    
  4. Reconfigure ncm-filecopy and ncm-symlinks to ensure that all the configuration files are present as some of them may have been deleted by the RPM upgrade:
    ncm-ncd --configure filecopy symlinks
    
  5. Update the CREAM CE databases and restart Tomcat:
    /root/sbin/cream_db_update.sh
    

Note that after the upgrade of the databases, it is not possible to roll back to previous version.

gLite-3.2.0-3: New ncm-cups version requires configuration changes

New version of ncm-cups delivered with version 3.2.0-3 of QWG templates changed the printer list format from a list to a nlist. To use this new version, you need to update your configuration. The change is fairly simple: the name property of the printer entries must be removed and its former value must be used as the nlist key.

For example if you had a printer definition like:

"/software/components/cups/printers" = push(
  nlist("name", "myprinter"
        "protocol", "LPD",
        "server", "printsrv.example.org",
        "location", "Building unkwown",
        "description", "My preferred printer type",
       ),
);

you will need to change it to:

"/software/components/cups/printers" = {
  SELF['myprinter'] = nlist("protocol", "LPD",
                            "server", "printsrv.example.org",
                            "location", "Building unkwown",
                            "description", "My preferred printer type",
                           );
  SELF
};

gLite-3.2.0-3: GIP FCR plugin not working properly

FCR is not working properly without a manual intervention on a top BDII configured by QWG templates. This is due to a bug in FCR GIP plugin (glite-info-plugin-fcr v2.0.0-2), tracked in Savanah. In the meantime, you need to create manually the following directory: /opt/glite/var/cache/gip/plugin/fcr and set its ownership to edguser:infosys.

gLite-3.2.0-3: possible problems when first installing a CREAM CE

After installing a CREAM CE, you may encounter a few errors requiring a manual intervention for cleaning them up.

  • If the machine certificate is not installed during machine installation, you need to copy the certificate in /etc/grid-security and run again filecopy component with the following command:
    ncm-ncd --configure filecopy
    
  • After installing the certificate you may also have to run manually fetch-crl if you want the CE to be immediatly operational. Look at the command to run in cron file /etc/cron.d/fetch-crl-cron.ncm-cron.cron. After updating CRLs, you need to restart Tomcat with the following command:
    service tomcat5 restart
    
  • Generally, for some unidentifed reason, MySQL database is not initialized properly at the first run of MySQL. This can be assessed looking at /var/log/ncm-cidspd. This can be fixed by running again Quattor configuration module mysql with the following command:
    ncm-ncd --configure mysql
    

In addition, there is sometimes a chicken and egg problem with a Tomcat application, ce-monitor. If the directory ls /usr/share/tomcat5/webapps/ce-monitor/WEB-INF contains only the classes sub-directory, you need to remove this directory and restart Tomcat:

rm -Rf ls /usr/share/tomcat5/webapps/ce-monitor
service tomcat5 restart

gLite-3.2.0-3: kernel errata deployment problem caused by a SPMA bug

This problem is not related strictly to gLite templates. It may affect any attempt to deploy kernel errata with SPMA version 1.10.34 to 1.11.1.

SPMA version 1.10.34 introduces a new feature, triggered by option protectkernel which is true by default, allowing a smooth upgrade of kernels and related modules. Kernel and related modules are not uninstalled as long as they are active, even though they are no longer part of the configuration. To implement this feature, SPMA tries to guess the kernel name and variant (smp, largesmp...). Unfortunatly the regexp used for this is buggy until SPMA 1.11.2. Depending on SPMA version it affects either largesmp variant or the other variants.

Fixed SPMA version is delivered with gLite-3.2.0-3 release of QWG templates (since r4946 of QWG trunk). But depending on the SPMA version you use when you upgrade to this version of the QWG templates, you may end up in a chicken and egg problem if you tried to deploy errata with a buggy version. The problem is that because of the SPMA bug there is dependency problem leading SPMA to fail and this prevents the upgrade of SPMA itself.

The suggested workaround is:

  • Roll back errata on the affected machines to a version matching what is currently deployed on the machine, so that SPMA can execute successfully.
  • Deploy the fixed SPMA version as provided by the last version of the templates.
  • Redeploy the OS errata.

gLite-3.2.0-3: Account UID changes following deployment of new VO configuration functions

As explained in the main changes section, the new VO configuration function ensure the SW manager and production user will always be assigned the same UID. This may lead to a change in the account UID numbering when first deploying the new version. This main VOs affected are: biomed, compchem, dzero, esr, fusion, flast.org, planck.

ncm-accounts has difficulties to carry on these renumbering operations. It is thus necessary to remove the potentially conflicting accounts and re-run ncm-accounts. One possibility is to use a pair of scripts installed and run by ncm-filecopy. The first one will remove the accounts. The second one will recreate them and change the ownership of home directories. Both scripts must be deployed one after the other. Example can be found in QWG examples:

After running those script, it may be necessary to fix also SW area ownership for VOs whose SW manager UID changed. This has to be done manually but this is generally not a problem as the SW area resides in a shared area.

If you are running a CE without shared home directories, there are some specific constraints when updating the VO accounts:

  • You have to ensure that the SSH keys for the new VO accounts are properly generated, if they are used.
  • When running the suggested script to remove some accounts before running ncm-accounts, you need to remove the definition of users variable to avoid having 2 accounts with the same UID as this causes an error with the scp command used to copy file between the CE and the WN. Because of this, you need to ensure that you have no jobs running under the xxxhs accounts or you need to drain the corresponding queue.

gLite-3.2.0-3: New DPM/xroot migration requires to stop manually olbd

The new DPM/xroot provided in this QWG release uses the new Cluster Management Services daemon, cmsd, instead of the legacy one, olbd. Because of a chicken and egg problem, it is not possible to stop the olbd daemon before removing the startup scripts. Unfortunatly, if olbd runs, cmsd cannot start. It is thus necessary to stop it manually on all DPM nodes and then restart cmsd.

This can be done with the following commands:

ps -e|grep olbd
kill -TERM of each process (there may be 2 on the head node)
service dpm-cms restart
service dpm-manager-cmsd restart (on the head node only)

gLite-3.2.0-2: stricter checks in filesystem/config.tpl may lead to compilation errors

As described in release notes, the standard configuration template in charge of producing disk partition configuration based on a layout descrption now implements much stricter validation checks on the configuration. As a result, existing layout may need to be fixed. Two usual problems are:

  • The disk partition referred in a device or devices attribute is not on the right disk or is using a partition already used by another file system or block device.
  • The variable in your site layout equivalent to DISK_GLITE_PARTS in the layout examples is not containing an entry for the extented partition in fourth position but there is the need to create a logical partition to implement your layout.

See the documentation for more information about producing a layout template.

gLite 3.2 requires a 64-bit OS

Machines hosting gLite services must be installed with a 64-bit OS. When upgrading from a previous gLite version, it is necessary to reinstall the machine if it was running a 32bit OS (OS upgrade from 32-bit to 64-bit is not supported).

Main Changes

Note : information in section, may refer to a not yet announced release. These informations are related to changes already present in the development trunk and about to be available or already in gLite-3.2 branch.

gLite-3.2.0-5: AII Kickstart plugin default configuration improved

Release 3.2.0-5 of QWG Templates delivers AII Kickstart plugin (aii-ks) version 1.1.33. This new version contains signficant improvements to the default configuration of disks in Kickstart configuration file generated by aii-shellfe from a machine profile. See AII documentation for details.

gLite-3.2.0-3: CREAM CE support added

Release 3.2.0-3 of QWG Templates introduces support of the CREAM CE. See the documentation for more information about how to configure it with QWG Templates.

During the initial installation, you may experience a few issues getting the CE to start. See for more information.

If you installed version 1.5 of the CREAM CE and would like to update it to version 1.6, be sure to read specific instructions about manuel steps involved.

gLite-3.2.0-3: ncm-named v2 requires explicit definition of start property

Release 3.2.0-3 of QWG Templates delivers the last version of ncm-named, v2. This version is a complete rewrite of the previous version to fix several issues and unexpected behaviours. It is intended to be backward-compatible with one exception concerning the behaviour of the component if start property is undefined.

In v1.x version it was true by default, with the potentially unexpected behaviour that named was started (with the existing configuration, whatever it was) as soon as the RPM was installed. undefined was considered the same as false and in this case, an enabled/running named was disabled/stopped.

In the new version, it is start is undefined by default. But when undefined, ncm-name does nothing to a currently enabled/running named (managed by some other means).

If you happen to rely on the old unexpected behaviour, please update your ncm-named configure to add:

'/software/components/named/start' = true;

gLite-3.2.0-3: VO configuration rewritten

Release 3.2.0-3 of QWG Templates introduces a new version of the functions used to configure VOs on a machine. This was necessary to overcome a few problems in the previous version, in particular in the ability to control the accounts created based on the FQAN enabled, and to improve the support for pool accounts for specific FQAN. See documentation on VO configuration for more details.

In addition, a change has been implemented in the UID allocation for specific FQANs. Previously, it was based only on the order of declaration in the voms_roles resource. In the new version, SW manager and production user are assigned a fixed UID, whatever their rank in the list: SW manager is assigned the first UID in the VO range, production user the second. This should reduce UID changes when there is a change in the list of roles declared. Unfortunatly, during the upgrade, a few VOs have some roles changed. This is mainly: biomed, compchem, dzero, esr, fusion, flast.org, planck. For these VOs the existing potentially conflicting accounts must be deleted. See known issues for more details on possible solutions.

gLite-3.2.0-3: VOBOX support

Release 3.2.0-3 of QWG Templates introduces support of VOBOX machine type. Conversely to the support for VOBOX in gLite 3.1, in gLite 3.2 QWG templates configure both the machine and the VOBOX specific services, including the creation and configuration of the required accounts. Documentation has been updated. When upgrading from gLite 3.1, it may be necessary to edit the site-specific configuration of the VOBOX and to remove basically everything except the definition of the supported VO.

Another change compared to gLite 3.1 support is that the number of VOs configured on the VOBOX is checked and an error is thrown if there is more than one VO configured (except operation VOs, ops by default).

gLite-3.2.0-3: Publishing GlueSite object in a non-standard BDII branch

Starting with release 3.2.0-3 of QWG Templates, it is now possible to publish the GlueSiteobject in a non-standard branch, rather than in mds-vo-name=resource. This is useful if you use several site BDIIs to increase service reliability. With the previous configuration, it was possible to have the GlueSite object published by several site BDIIs. This was harmless if the object was the same with the same DN. It was mainly a problem when using an internal hierarchy of site BDIIs (subsite BDIIs) inside the site.

The new feature allows not to mix the GlueSite object with the other objects published by the resource BDII of the site BDII and thus to get it published to the top BDII only by the active site BDII (the one used by the DNS alias associated with the service). See BDII configuration documentation for more details.

gLite-3.2.0-3: Profile cloning improved

Profile cloning, formerly known as dummy WN, has been rewritten in release 3.2.0-3 of QWG Templates to improve its configuration flexibility and consistency. The configuration method is intended to be backward compatible and is described in gLite-related documentation.

Examples have been added into examples provided with QWG templates.

gLite-3.2.0-3: New version of DPM/xrootd

A new version of xrootd for DPM, 20090729.0855-2, is provided starting with gLite update 06 in release 3.2.0-3 of QWG Templates. This new release fixes all the known issue with the previous, very old, release of xrootd shipped with DPM.

Some configuration inconsistencies have been fixed in DPM/xrootd and this requires to remove the following options from your DPM configuration if they were defined:

  • /software/components/dpmlfc/options/dpm/xroot/config
  • /software/components/dpmlfc/options/dpm/xroot/ofsPlugin

The default value for these options is now appropriate.

Due to a daemon name change, the upgrade may involve a manual step. See know problems.

gLite-3.2.0-2: OS_VERSION_PARAMS added

Standard configuration of a gLite machine now defines as part of the OS selection a new variables OS_VERSION_PARAMS to ease further processing based on OS version (major or minor) or architecture. See OS configuration for more details.

gLite-3.2.0-2: Torque directPaths don't depend on shared home directories

In previous QWG releases, WN_SHARED_AREAS entries were not configured as Torque direct paths (configured with $usecp directive in MOM client configuration) when the CE was not configured to use shared home directories. This has been fixed and now every entry in WN_SHARED_AREAS is configured as a Torque direct path unconditionally. Existing WN configuration may change but this should be harmless.

gLite-3.2.0-2: improved OS errata management

Release 3.2.0-7 of QWG Templates has an improved support for OS errata, offering an increased flexibility in large environments where you want to use a stage deployment. See the documentation for more details.

gLite-3.2.0-2: disk partitionning enhancements and fixes

The template that allows configuration of disk partitions based on a layout template has been fixed to handle properly configurations not based on LVM (extended partitions and software raid). It has also been improved to do much more validation checks on layout description and do the necessary renumbering of partitions to ensure numbers are consecutive and that partitions are created in the appropriate order (for example, partitions without an explicit size created last).

Layout examples are now provided for the 3 major configurations: LVM-based, extended partitions and software raid.

As a result of these stricter checks, your existing may not compile successfully anymore. See known problems.

gLite 3.2.0-2: SL4 32-bit compatibility added to WN

SL4 32-bit compatibility as required by WLCG VOs (and probably many others with SL4 32-bit applications) have been added by default to gLite 3.2 WN. This can be disabled by defining variable WN_WLCG_SL4_32BIT_COMPAT to false.

Change Log

ChangeLog build from repository commit messages...

Changelog not available

Last modified 13 years ago Last modified on Feb 27, 2011, 2:24:56 PM