Release Notes for gLite 3.2 Templates
TracNav
Table of Contents
- QWG Releases
- gLite Updates
- Upgrading from gLite 3.1
- Known Problems of Releases
- gLite-3.2.0-4: gLite 3.2 update 21 contains a major DPMLFC upgrade (1.8)
- gLite-3.2.0-3: CREAM 1.6 upgrade requires manual steps
- gLite-3.2.0-3: New ncm-cups version requires configuration changes
- gLite-3.2.0-3: GIP FCR plugin not working properly
- gLite-3.2.0-3: possible problems when first installing a CREAM CE
- gLite-3.2.0-3: kernel errata deployment problem caused by a SPMA bug
- gLite-3.2.0-3: Account UID changes following deployment of new VO …
- gLite-3.2.0-3: New DPM/xroot migration requires to stop manually olbd
- gLite-3.2.0-2: stricter checks in filesystem/config.tpl may lead to …
- gLite 3.2 requires a 64-bit OS
- Main Changes
- gLite-3.2.0-5: AII Kickstart plugin default configuration improved
- gLite-3.2.0-3: CREAM CE support added
- gLite-3.2.0-3: ncm-named v2 requires explicit definition of start property
- gLite-3.2.0-3: VO configuration rewritten
- gLite-3.2.0-3: VOBOX support
- gLite-3.2.0-3: Publishing GlueSite object in a non-standard BDII branch
- gLite-3.2.0-3: Profile cloning improved
- gLite-3.2.0-3: New version of DPM/xrootd
- gLite-3.2.0-2: OS_VERSION_PARAMS added
- gLite-3.2.0-2: Torque directPaths don't depend on shared home directories
- gLite-3.2.0-2: improved OS errata management
- gLite-3.2.0-2: disk partitionning enhancements and fixes
- gLite 3.2.0-2: SL4 32-bit compatibility added to WN
- Change Log
This page contains information about each release of QWG Templates for gLite 3.2, in particular new or changes features and known problems. To know how to configure the template, refer to the page on gLite templates customization.
QWG Releases
Note : you can have a look at ongoing developments and progress of upcoming release in the roadmap, last entries in the gLite-3.2 branch ChangeLog or full log of trunk branch.'
Date | Release | Description |
19/4/2009 | Creation of branch gLite-3.1 | |
10/7/2009 | gLite-3.2.0-1 | WN, UI, BDII |
18/10/2009 | gLite-3.2.0-2 | FroNTieR, NFS, Torque server, gLite 3.2 update 04 and 05 |
09/08/2010 | gLite-3.2.0-3 | gLite updates up to 15, CA 1.31-36, CREAM CE, VOBOX, DPM, LFC OS errata |
09/02/2011 | gLite-3.2.0-4 | gLite updates up to 21, CA 1.37, ARGUS, DPM/LFC 1.8, OS errata |
gLite Updates
QWG templates releases deliver the last gLite updates available at the time of the release. There is no match between QWG release number (-n
) and gLite update numbers. Sometimes one QWG templates release deliver several gLite updates. In each QWG release, there is a default associated gLite update (generally the last one).
QWG releases provide a standard mechanism for selecting the gLite update you want to deploy on a per node, per cluster or per site basis.
Content of gLite updates and associated release notes can be viewed at http://glite.web.cern.ch/glite/packages/R3.2/updates.asp.
Upgrading from gLite 3.1
QWG releases for gLite 3.2 supports updating existing gLite 3.1 nodes, managed by QWG releases gLite-3.1.0 series. The only restriction is that gLite 3.2 is supported only on 64-bit machines, thus if gLite 3.1 nodes running a 32-bit OS must be reinstalled.
Note: despite gLite 3.1 was mainly 32-bit, all the gLite 3.1 machine types and services in QWG templates were supported on 64-bit Linux.
gLite version cannot be mixed inside the same SCDB cluster. Updating existing node involves :
- Creating a new cluster using the following steps :
- Copy the existing cluster with
svn cp
- Edit
cluster.build.properties
of the new cluster to reflect the use of gLite 3.2 templates intead of gLite 3.1 - Review other cluster specific templates, in particular
site/cluster_info.tpl
, for any change that could be necessary (none required).
- Copy the existing cluster with
- Moving node profiles from one cluster to the other with
svn mv
command.
Note : there is no problem to have nodes part of the same grid configuration (e.g. CE and WNs) belonging to 2 different clusters. It is recommended to put site specific gLite parameters, site/glite/config.tpl, in your site directory.
Known Problems of Releases
This section describes severe issues that have been discovered after a release of QWG templates. They are generally fixed in the release preview of next releases, available in gLite 3.2 branch, which is the most up-to-date version recommended for use in production.
The version mentioned in each title is the release having the problem.
gLite-3.2.0-4: gLite 3.2 update 21 contains a major DPMLFC upgrade (1.8)
gLite 3.2 update 21, delivered as part of version 3.2.0-4 of QWG templates, brings DPM and LFC 1.8 which are major DPM and LFC upgrades requiring a database schema change. The schema change is not deployed by QWG templates. The schema upgrade script must be run manually with the following commands:
# For LFC, the script is in /opt/lcg/share/LFC/cns-db-300-to-310 cd /opt/lcg/share/DPM/cns-db-300-to-310 # First check the current schema version (in case of failure, see comment below) ./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db --show-version-only # If less than `3 1 0`, upgrade the schema ./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db
Note: the MySQL password must be stored in the file passed as option --pwd-file
before running the command. It is harmless to run the script twice.
Note that the upgraded database is still compatible with DPM/LFC 1.7. This allows to upgrade the schema before the DPM/LFC upgrade and to roll back to 1.7 in case of severe issues with 1.8 (none known). The schema upgrade doesn't require to stop DPM/LFC. To apply the schema upgrade before updating DPM/LFC, you need to retrieve the upgrade script (directory) from the appropriate 1.8 RPM:
- DPM :
DPM-name-server-mysql
- LFC :
DPM-name-server-mysql
Also note that it may happen (both for DPM and LFC) that the column names indicating the schema version may not match what is expected. In this case, running the script with --show-version-only
results in a failure to query the version. To fix it, connect to the MySQL database and enter the following commands:
use cns_db; ALTER TABLE schema_version CHANGE major_number major INTEGER(11); ALTER TABLE schema_version CHANGE minor_number minor INTEGER(11); ALTER TABLE schema_version CHANGE patch_number patch INTEGER(11);
Then rerun the upgrade script.
gLite-3.2.0-3: CREAM 1.6 upgrade requires manual steps
Version 3.2.0-3 of QWG templates delivers new CREAM CE version 1.6 (released as part of gLite update 12). This is a major update of CREAM CE. When updating an existing CREAM CE (delivered as part of an early version of QWG templates 3.2.0-3), the following manual steps must be done after deploying the new version to restore the service:
- Stop the CREAM CE:
service tomcat5 stop
- Delete contents of
/usr/share/tomcat5/webapps
directory. This is a Tomcat requirement and if this is not done, Tomcat will continue to use the old CREAM CE version:rm -Rf /usr/share/tomcat5/webapps/*
- Delete the existing Tomcat application configurations:
rm /usr/share/tomcat5/conf/Catalina/localhost/*
- Reconfigure
ncm-filecopy
andncm-symlinks
to ensure that all the configuration files are present as some of them may have been deleted by the RPM upgrade:ncm-ncd --configure filecopy symlinks
- Update the CREAM CE databases and restart Tomcat:
/root/sbin/cream_db_update.sh
Note that after the upgrade of the databases, it is not possible to roll back to previous version.
gLite-3.2.0-3: New ncm-cups version requires configuration changes
New version of ncm-cups
delivered with version 3.2.0-3 of QWG templates changed the printer list format from a list to a nlist. To use this new version, you need to update your configuration. The change is fairly simple: the name
property of the printer entries must be removed and its former value must be used as the nlist key.
For example if you had a printer definition like:
"/software/components/cups/printers" = push( nlist("name", "myprinter" "protocol", "LPD", "server", "printsrv.example.org", "location", "Building unkwown", "description", "My preferred printer type", ), );
you will need to change it to:
"/software/components/cups/printers" = { SELF['myprinter'] = nlist("protocol", "LPD", "server", "printsrv.example.org", "location", "Building unkwown", "description", "My preferred printer type", ); SELF };
gLite-3.2.0-3: GIP FCR plugin not working properly
FCR is not working properly without a manual intervention on a top BDII configured by QWG templates. This is due to a bug in FCR GIP plugin (glite-info-plugin-fcr
v2.0.0-2), tracked in Savanah. In the meantime, you need to create manually the following directory: /opt/glite/var/cache/gip/plugin/fcr
and set its ownership to edguser:infosys
.
gLite-3.2.0-3: possible problems when first installing a CREAM CE
After installing a CREAM CE, you may encounter a few errors requiring a manual intervention for cleaning them up.
- If the machine certificate is not installed during machine installation, you need to copy the certificate in
/etc/grid-security
and run againfilecopy
component with the following command:ncm-ncd --configure filecopy
- After installing the certificate you may also have to run manually
fetch-crl
if you want the CE to be immediatly operational. Look at the command to run in cron file/etc/cron.d/fetch-crl-cron.ncm-cron.cron
. After updating CRLs, you need to restart Tomcat with the following command:service tomcat5 restart
- Generally, for some unidentifed reason, MySQL database is not initialized properly at the first run of MySQL. This can be assessed looking at
/var/log/ncm-cidspd
. This can be fixed by running again Quattor configuration modulemysql
with the following command:ncm-ncd --configure mysql
In addition, there is sometimes a chicken and egg problem with a Tomcat application, ce-monitor
. If the directory ls /usr/share/tomcat5/webapps/ce-monitor/WEB-INF
contains only the classes
sub-directory, you need to remove this directory and restart Tomcat:
rm -Rf ls /usr/share/tomcat5/webapps/ce-monitor service tomcat5 restart
gLite-3.2.0-3: kernel errata deployment problem caused by a SPMA bug
This problem is not related strictly to gLite templates. It may affect any attempt to deploy kernel errata with SPMA version 1.10.34 to 1.11.1.
SPMA version 1.10.34 introduces a new feature, triggered by option protectkernel
which is true by default, allowing a smooth upgrade of kernels and related modules. Kernel and related modules are not uninstalled as long as they are active, even though they are no longer part of the configuration. To implement this feature, SPMA tries to guess the kernel name and variant (smp
, largesmp
...). Unfortunatly the regexp used for this is buggy until SPMA 1.11.2. Depending on SPMA version it affects either largesmp
variant or the other variants.
Fixed SPMA version is delivered with gLite-3.2.0-3 release of QWG templates (since r4946 of QWG trunk). But depending on the SPMA version you use when you upgrade to this version of the QWG templates, you may end up in a chicken and egg problem if you tried to deploy errata with a buggy version. The problem is that because of the SPMA bug there is dependency problem leading SPMA to fail and this prevents the upgrade of SPMA itself.
The suggested workaround is:
- Roll back errata on the affected machines to a version matching what is currently deployed on the machine, so that SPMA can execute successfully.
- Deploy the fixed SPMA version as provided by the last version of the templates.
- Redeploy the OS errata.
gLite-3.2.0-3: Account UID changes following deployment of new VO configuration functions
As explained in the main changes section, the new VO configuration function ensure the SW manager and production user will always be assigned the same UID. This may lead to a change in the account UID numbering when first deploying the new version. This main VOs affected are: biomed
, compchem
, dzero
, esr
, fusion
, flast.org
, planck
.
ncm-accounts
has difficulties to carry on these renumbering operations. It is thus necessary to remove the potentially conflicting accounts and re-run ncm-accounts
. One possibility is to use a pair of scripts installed and run by ncm-filecopy
. The first one will remove the accounts. The second one will recreate them and change the ownership of home directories. Both scripts must be deployed one after the other. Example can be found in QWG examples:
- site/misc/fix_fqan_accounts.tpl: script to remove the potentially conflicting accounts.
- site/misc/fix_fqan_dirs.tpl: script to fix home directory permissions for modified accounts after recreating them.
After running those script, it may be necessary to fix also SW area ownership for VOs whose SW manager UID changed. This has to be done manually but this is generally not a problem as the SW area resides in a shared area.
If you are running a CE without shared home directories, there are some specific constraints when updating the VO accounts:
- You have to ensure that the SSH keys for the new VO accounts are properly generated, if they are used.
- When running the suggested script to remove some accounts before running
ncm-accounts
, you need to remove the definition ofusers
variable to avoid having 2 accounts with the same UID as this causes an error with thescp
command used to copy file between the CE and the WN. Because of this, you need to ensure that you have no jobs running under thexxxhs
accounts or you need to drain the corresponding queue.
gLite-3.2.0-3: New DPM/xroot migration requires to stop manually olbd
The new DPM/xroot provided in this QWG release uses the new Cluster Management Services daemon, cmsd
, instead of the legacy one, olbd
. Because of a chicken and egg problem, it is not possible to stop the olbd
daemon before removing the startup scripts. Unfortunatly, if olbd
runs, cmsd
cannot start. It is thus necessary to stop it manually on all DPM nodes and then restart cmsd
.
This can be done with the following commands:
ps -e|grep olbd kill -TERM of each process (there may be 2 on the head node) service dpm-cms restart service dpm-manager-cmsd restart (on the head node only)
gLite-3.2.0-2: stricter checks in filesystem/config.tpl may lead to compilation errors
As described in release notes, the standard configuration template in charge of producing disk partition configuration based on a layout descrption now implements much stricter validation checks on the configuration. As a result, existing layout may need to be fixed. Two usual problems are:
- The disk partition referred in a
device
ordevices
attribute is not on the right disk or is using a partition already used by another file system or block device. - The variable in your site layout equivalent to
DISK_GLITE_PARTS
in the layout examples is not containing an entry for the extented partition in fourth position but there is the need to create a logical partition to implement your layout.
See the documentation for more information about producing a layout template.
gLite 3.2 requires a 64-bit OS
Machines hosting gLite services must be installed with a 64-bit OS. When upgrading from a previous gLite version, it is necessary to reinstall the machine if it was running a 32bit OS (OS upgrade from 32-bit to 64-bit is not supported).
Main Changes
Note : information in section, may refer to a not yet announced release. These informations are related to changes already present in the development trunk and about to be available or already in gLite-3.2 branch.
gLite-3.2.0-5: AII Kickstart plugin default configuration improved
Release 3.2.0-5 of QWG Templates delivers AII Kickstart plugin (aii-ks
) version 1.1.33. This new version contains signficant improvements to the default configuration of disks in Kickstart configuration file generated by aii-shellfe
from a machine profile. See AII documentation for details.
gLite-3.2.0-3: CREAM CE support added
Release 3.2.0-3 of QWG Templates introduces support of the CREAM CE. See the documentation for more information about how to configure it with QWG Templates.
During the initial installation, you may experience a few issues getting the CE to start. See for more information.
If you installed version 1.5 of the CREAM CE and would like to update it to version 1.6, be sure to read specific instructions about manuel steps involved.
gLite-3.2.0-3: ncm-named v2 requires explicit definition of start property
Release 3.2.0-3 of QWG Templates delivers the last version of ncm-named
, v2. This version is a complete rewrite of the previous version to fix several issues and unexpected behaviours. It is intended to be backward-compatible with one exception concerning the behaviour of the component if start
property is undefined.
In v1.x version it was true
by default, with the potentially unexpected behaviour that named
was started (with the existing configuration, whatever it was) as soon as the RPM was installed. undefined
was considered the same as false
and in this case, an enabled/running named
was disabled/stopped.
In the new version, it is start
is undefined
by default. But when undefined
, ncm-name
does nothing to a currently enabled/running named
(managed by some other means).
If you happen to rely on the old unexpected behaviour, please update your ncm-named
configure to add:
'/software/components/named/start' = true;
gLite-3.2.0-3: VO configuration rewritten
Release 3.2.0-3 of QWG Templates introduces a new version of the functions used to configure VOs on a machine. This was necessary to overcome a few problems in the previous version, in particular in the ability to control the accounts created based on the FQAN enabled, and to improve the support for pool accounts for specific FQAN. See documentation on VO configuration for more details.
In addition, a change has been implemented in the UID allocation for specific FQANs. Previously, it was based only on the order of declaration in the voms_roles
resource. In the new version, SW manager and production user are assigned a fixed UID, whatever their rank in the list: SW manager is assigned the first UID in the VO range, production user the second. This should reduce UID changes when there is a change in the list of roles declared. Unfortunatly, during the upgrade, a few VOs have some roles changed. This is mainly: biomed
, compchem
, dzero
, esr
, fusion
, flast.org
, planck
. For these VOs the existing potentially conflicting accounts must be deleted. See known issues for more details on possible solutions.
gLite-3.2.0-3: VOBOX support
Release 3.2.0-3 of QWG Templates introduces support of VOBOX machine type. Conversely to the support for VOBOX in gLite 3.1, in gLite 3.2 QWG templates configure both the machine and the VOBOX specific services, including the creation and configuration of the required accounts. Documentation has been updated. When upgrading from gLite 3.1, it may be necessary to edit the site-specific configuration of the VOBOX and to remove basically everything except the definition of the supported VO.
Another change compared to gLite 3.1 support is that the number of VOs configured on the VOBOX is checked and an error is thrown if there is more than one VO configured (except operation VOs, ops
by default).
gLite-3.2.0-3: Publishing GlueSite object in a non-standard BDII branch
Starting with release 3.2.0-3 of QWG Templates, it is now possible to publish the GlueSite
object in a non-standard branch, rather than in mds-vo-name=resource
. This is useful if you use several site BDIIs to increase service reliability. With the previous configuration, it was possible to have the GlueSite
object published by several site BDIIs. This was harmless if the object was the same with the same DN. It was mainly a problem when using an internal hierarchy of site BDIIs (subsite BDIIs) inside the site.
The new feature allows not to mix the GlueSite object with the other objects published by the resource BDII of the site BDII and thus to get it published to the top BDII only by the active site BDII (the one used by the DNS alias associated with the service). See BDII configuration documentation for more details.
gLite-3.2.0-3: Profile cloning improved
Profile cloning, formerly known as dummy WN, has been rewritten in release 3.2.0-3 of QWG Templates to improve its configuration flexibility and consistency. The configuration method is intended to be backward compatible and is described in gLite-related documentation.
Examples have been added into examples provided with QWG templates.
gLite-3.2.0-3: New version of DPM/xrootd
A new version of xrootd for DPM, 20090729.0855-2, is provided starting with gLite update 06 in release 3.2.0-3 of QWG Templates. This new release fixes all the known issue with the previous, very old, release of xrootd shipped with DPM.
Some configuration inconsistencies have been fixed in DPM/xrootd and this requires to remove the following options from your DPM configuration if they were defined:
- /software/components/dpmlfc/options/dpm/xroot/config
- /software/components/dpmlfc/options/dpm/xroot/ofsPlugin
The default value for these options is now appropriate.
Due to a daemon name change, the upgrade may involve a manual step. See know problems.
gLite-3.2.0-2: OS_VERSION_PARAMS added
Standard configuration of a gLite machine now defines as part of the OS selection a new variables OS_VERSION_PARAMS
to ease further processing based on OS version (major or minor) or architecture. See OS configuration for more details.
gLite-3.2.0-2: Torque directPaths don't depend on shared home directories
In previous QWG releases, WN_SHARED_AREAS
entries were not configured as Torque direct paths (configured with $usecp directive in MOM client configuration) when the CE was not configured to use shared home directories. This has been fixed and now every entry in WN_SHARED_AREAS
is configured as a Torque direct path unconditionally. Existing WN configuration may change but this should be harmless.
gLite-3.2.0-2: improved OS errata management
Release 3.2.0-7 of QWG Templates has an improved support for OS errata, offering an increased flexibility in large environments where you want to use a stage deployment. See the documentation for more details.
gLite-3.2.0-2: disk partitionning enhancements and fixes
The template that allows configuration of disk partitions based on a layout template has been fixed to handle properly configurations not based on LVM (extended partitions and software raid). It has also been improved to do much more validation checks on layout description and do the necessary renumbering of partitions to ensure numbers are consecutive and that partitions are created in the appropriate order (for example, partitions without an explicit size created last).
Layout examples are now provided for the 3 major configurations: LVM-based, extended partitions and software raid.
As a result of these stricter checks, your existing may not compile successfully anymore. See known problems.
gLite 3.2.0-2: SL4 32-bit compatibility added to WN
SL4 32-bit compatibility as required by WLCG VOs (and probably many others with SL4 32-bit applications) have been added by default to gLite 3.2 WN. This can be disabled by defining variable WN_WLCG_SL4_32BIT_COMPAT
to false
.
Change Log
ChangeLog build from repository commit messages...
Changelog not available