= Release Notes for gLite 3.2 Templates = [[TracNav]] [[TOC(inline)]] This page contains information about each release of QWG Templates for gLite 3.2, in particular new or changes features and known problems. To know how to configure the template, refer to the page on gLite templates [wiki:Doc/gLite/TemplateCustomization customization]. == QWG Releases == ''Note : you can have a look at ongoing developments and progress of upcoming release in the [/roadmap roadmap], last entries in the gLite-3.2 branch [wiki:ReleaseNotes/gLite-3.1#ChangeLog ChangeLog] or [log:templates/trunk full log] of trunk branch.' || Date || Release || Description || || 19/4/2009 || || Creation of branch gLite-3.1 [[BR]] || || 10/7/2009 || [milestone:gLite-3.2.0-1 gLite-3.2.0-1] || WN, UI, BDII || || 18/10/2009 || [milestone:gLite-3.2.0-2 gLite-3.2.0-2] || FroNTieR, NFS, Torque server, gLite 3.2 update 04 and 05 || || 09/08/2010 || [milestone:gLite-3.2.0-3 gLite-3.2.0-3] || gLite updates up to 15, CA 1.31-36, CREAM CE, VOBOX, DPM, LFC OS errata || || 09/02/2011 || [milestone:gLite-3.2.0-4 gLite-3.2.0-4] || gLite updates up to 21, CA 1.37, ARGUS, DPM/LFC 1.8, OS errata || == gLite Updates == QWG templates releases deliver the last gLite updates available at the time of the release. There is no match between QWG release number (`-n`) and gLite update numbers. Sometimes one QWG templates release deliver several gLite updates. In each QWG release, there is a default associated gLite update (generally the last one). QWG releases provide a standard mechanism for [wiki:Download/QWGTemplates/Install#ControllinggLiteUpdatesInstallation selecting] the gLite update you want to deploy on a per node, per cluster or per site basis. Content of gLite updates and associated release notes can be viewed at http://glite.web.cern.ch/glite/packages/R3.2/updates.asp. == Upgrading from gLite 3.1 == QWG releases for gLite 3.2 supports updating existing gLite 3.1 nodes, managed by QWG releases gLite-3.1.0 series. The only restriction is that gLite 3.2 is supported '''only''' on 64-bit machines, thus if gLite 3.1 nodes running a 32-bit OS must be reinstalled. ''Note: despite gLite 3.1 was mainly 32-bit, all the gLite 3.1 machine types and services in QWG templates were supported on 64-bit Linux.'' gLite version cannot be mixed inside the same SCDB cluster. Updating existing node involves : * Creating a new cluster using the following steps : * Copy the existing cluster with `svn cp` * Edit `cluster.build.properties` of the new cluster to reflect the use of gLite 3.2 templates intead of gLite 3.1 * Review other cluster specific templates, in particular `site/cluster_info.tpl`, for any change that could be necessary (none required). * Moving node profiles from one cluster to the other with `svn mv` command. ''Note : there is no problem to have nodes part of the same grid configuration (e.g. CE and WNs) belonging to 2 different clusters. It is recommended to put site specific gLite parameters, [source:templates/branches/gLite-3.2/sites/example/site/glite/config.tpl site/glite/config.tpl], in your site directory.'' == Known Problems of Releases == This section describes severe issues that have been discovered after a release of QWG templates. They are generally fixed in the release preview of next releases, available in [source:templates/branches/gLite-3.2 gLite 3.2 branch], which is the '''most up-to-date version recommended for use in production'''. The version mentioned in each title is the release having the problem. === gLite-3.2.0-4: gLite 3.2 update 21 contains a major DPMLFC upgrade (1.8) === #KI3204DPM18 gLite 3.2 update 21, delivered as part of version [milestone:gLite-3.2.0-4 3.2.0-4] of QWG templates, brings DPM and LFC 1.8 which are major DPM and LFC upgrades requiring a database schema change. The schema change is not deployed by QWG templates. '''The schema upgrade script must be run manually''' with the following commands: {{{ # For LFC, the script is in /opt/lcg/share/LFC/cns-db-300-to-310 cd /opt/lcg/share/DPM/cns-db-300-to-310 # First check the current schema version (in case of failure, see comment below) ./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db --show-version-only # If less than `3 1 0`, upgrade the schema ./cns_db_300_to_310 --db-vendor MySQL --user root --pwd-file /tmp/pwd --db-host localhost --db cns_db }}} ''Note: the MySQL password must be stored in the file passed as option `--pwd-file` before running the command. It is harmless to run the script twice.'' Note that the upgraded database is still compatible with DPM/LFC 1.7. This allows to upgrade the schema before the DPM/LFC upgrade and to roll back to 1.7 in case of severe issues with 1.8 (none known). The schema upgrade doesn't require to stop DPM/LFC. To apply the schema upgrade before updating DPM/LFC, you need to retrieve the upgrade script (directory) from the appropriate 1.8 RPM: * DPM : `DPM-name-server-mysql` * LFC : `DPM-name-server-mysql` Also note that it may happen (both for DPM and LFC) that the column names indicating the schema version may not match what is expected. In this case, running the script with `--show-version-only` results in a failure to query the version. To fix it, connect to the MySQL database and enter the following commands: {{{ use cns_db; ALTER TABLE schema_version CHANGE major_number major INTEGER(11); ALTER TABLE schema_version CHANGE minor_number minor INTEGER(11); ALTER TABLE schema_version CHANGE patch_number patch INTEGER(11); }}} Then rerun the upgrade script. === gLite-3.2.0-3: CREAM 1.6 upgrade requires manual steps === #KI3203CREAM16 Version [milestone:gLite-3.2.0-3 3.2.0-3] of QWG templates delivers new CREAM CE version 1.6 (released as part of gLite update 12). This is a major update of CREAM CE. When updating an existing CREAM CE (delivered as part of an early version of QWG templates [milestone:gLite-3.2.0-3 3.2.0-3]), the following manual steps must be done after deploying the new version '''to restore the service''': 1. Stop the CREAM CE: {{{ service tomcat5 stop }}} 1. Delete contents of `/usr/share/tomcat5/webapps` directory. This is a Tomcat requirement and if this is not done, Tomcat will continue to use the old CREAM CE version: {{{ rm -Rf /usr/share/tomcat5/webapps/* }}} 1. Delete the existing Tomcat application configurations: {{{ rm /usr/share/tomcat5/conf/Catalina/localhost/* }}} 1. Reconfigure `ncm-filecopy` and `ncm-symlinks` to ensure that all the configuration files are present as some of them may have been deleted by the RPM upgrade: {{{ ncm-ncd --configure filecopy symlinks }}} 1. Update the CREAM CE databases and restart Tomcat: {{{ /root/sbin/cream_db_update.sh }}} Note that after the upgrade of the databases, it is not possible to roll back to previous version. === gLite-3.2.0-3: New ncm-cups version requires configuration changes === #KI3203cups New version of `ncm-cups` delivered with version [milestone:gLite-3.2.0-3 3.2.0-3] of QWG templates changed the printer list format from a list to a nlist. To use this new version, you need to update your configuration. The change is fairly simple: the `name` property of the printer entries must be removed and its former value must be used as the nlist key. For example if you had a printer definition like: {{{ "/software/components/cups/printers" = push( nlist("name", "myprinter" "protocol", "LPD", "server", "printsrv.example.org", "location", "Building unkwown", "description", "My preferred printer type", ), ); }}} you will need to change it to: {{{ "/software/components/cups/printers" = { SELF['myprinter'] = nlist("protocol", "LPD", "server", "printsrv.example.org", "location", "Building unkwown", "description", "My preferred printer type", ); SELF }; }}} === gLite-3.2.0-3: GIP FCR plugin not working properly === FCR is not working properly without a manual intervention on a top BDII configured by QWG templates. This is due to a bug in FCR GIP plugin (`glite-info-plugin-fcr` v2.0.0-2), tracked in [https://savannah.cern.ch/bugs/index.php?59649 Savanah]. In the meantime, you need to create manually the following directory: `/opt/glite/var/cache/gip/plugin/fcr` and set its ownership to `edguser:infosys`. === gLite-3.2.0-3: possible problems when first installing a CREAM CE === #KI3203CREAMCE After installing a CREAM CE, you may encounter a few errors requiring a manual intervention for cleaning them up. * If the machine certificate is not installed during machine installation, you need to copy the certificate in `/etc/grid-security` and run again `filecopy` component with the following command: {{{ ncm-ncd --configure filecopy }}} * After installing the certificate you may also have to run manually `fetch-crl` if you want the CE to be immediatly operational. Look at the command to run in cron file `/etc/cron.d/fetch-crl-cron.ncm-cron.cron`. After updating CRLs, you need to restart Tomcat with the following command: {{{ service tomcat5 restart }}} * Generally, for some unidentifed reason, MySQL database is not initialized properly at the first run of MySQL. This can be assessed looking at `/var/log/ncm-cidspd`. This can be fixed by running again Quattor configuration module `mysql` with the following command: {{{ ncm-ncd --configure mysql }}} In addition, there is sometimes a chicken and egg problem with a Tomcat application, `ce-monitor`. If the directory `ls /usr/share/tomcat5/webapps/ce-monitor/WEB-INF` contains only the `classes` sub-directory, you need to remove this directory and restart Tomcat: {{{ rm -Rf ls /usr/share/tomcat5/webapps/ce-monitor service tomcat5 restart }}} === gLite-3.2.0-3: kernel errata deployment problem caused by a SPMA bug === This problem is not related strictly to gLite templates. It may affect any attempt to deploy kernel errata with SPMA version 1.10.34 to 1.11.1. SPMA version 1.10.34 introduces a new feature, triggered by option `protectkernel` which is true by default, allowing a smooth upgrade of kernels and related modules. Kernel and related modules are not uninstalled as long as they are active, even though they are no longer part of the configuration. To implement this feature, SPMA tries to guess the kernel name and variant (`smp`, `largesmp`...). Unfortunatly the regexp used for this is buggy until SPMA 1.11.2. Depending on SPMA version it affects either `largesmp` variant or the other variants. Fixed SPMA version is delivered with [milestone:gLite-3.2.0-3 gLite-3.2.0-3] release of QWG templates (since r4946 of QWG trunk). But depending on the SPMA version you use when you upgrade to this version of the QWG templates, you may end up in a chicken and egg problem if you tried to deploy errata with a buggy version. The problem is that because of the SPMA bug there is dependency problem leading SPMA to fail and this prevents the upgrade of SPMA itself. The suggested workaround is: * Roll back errata on the affected machines to a version matching what is currently deployed on the machine, so that SPMA can execute successfully. * Deploy the fixed SPMA version as provided by the last version of the templates. * Redeploy the OS errata. === gLite-3.2.0-3: Account UID changes following deployment of new VO configuration functions === #KI3203VOConfig As explained in the [#MC3203VOConfig main changes] section, the new VO configuration function ensure the SW manager and production user will always be assigned the same UID. This may lead to a change in the account UID numbering when first deploying the new version. This main VOs affected are: `biomed`, `compchem`, `dzero`, `esr`, `fusion`, `flast.org`, `planck`. `ncm-accounts` has difficulties to carry on these renumbering operations. It is thus necessary to remove the potentially conflicting accounts and re-run `ncm-accounts`. One possibility is to use a pair of scripts installed and run by `ncm-filecopy`. The first one will remove the accounts. The second one will recreate them and change the ownership of home directories. Both scripts must be deployed one after the other. Example can be found in [source:templates/trunk/sites/example/site/misc QWG examples]: * [source:templates/trunk/sites/example/site/misc/fix_fqan_accounts.tpl site/misc/fix_fqan_accounts.tpl]: script to remove the potentially conflicting accounts. * [source:templates/trunk/sites/example/site/misc/fix_fqan_dirs.tpl site/misc/fix_fqan_dirs.tpl]: script to fix home directory permissions for modified accounts after recreating them. After running those script, it may be necessary to fix also SW area ownership for VOs whose SW manager UID changed. This has to be done manually but this is generally not a problem as the SW area resides in a shared area. If you are running a CE without shared home directories, there are some specific constraints when updating the VO accounts: * You have to ensure that the SSH keys for the new VO accounts are properly generated, if they are used. * When running the [source:templates/trunk/sites/example/site/misc/fix_fqan_accounts.tpl suggested script] to remove some accounts before running `ncm-accounts`, you need to remove the definition of `users` variable to avoid having 2 accounts with the same UID as this causes an error with the `scp` command used to copy file between the CE and the WN. Because of this, you need to ensure that you have no jobs running under the `xxxhs` accounts or you need to drain the corresponding queue. === gLite-3.2.0-3: New DPM/xroot migration requires to stop manually olbd === #KI3203DPMxrootd The new [#MC3203DPMxrootd DPM/xroot] provided in this QWG [milestone:gLite-3.2.0-3 release] uses the new Cluster Management Services daemon, `cmsd`, instead of the legacy one, `olbd`. Because of a chicken and egg problem, it is not possible to stop the `olbd` daemon before removing the startup scripts. Unfortunatly, if `olbd` runs, `cmsd` cannot start. It is thus necessary to stop it manually on all DPM nodes and then restart `cmsd`. This can be done with the following commands: {{{ ps -e|grep olbd kill -TERM of each process (there may be 2 on the head node) service dpm-cms restart service dpm-manager-cmsd restart (on the head node only) }}} === gLite-3.2.0-2: stricter checks in filesystem/config.tpl may lead to compilation errors === As described in [#gLite-3.2.0-2:diskpartitionningenhancementsandfixes release notes], the standard configuration template in charge of producing disk partition configuration based on a [/wiki/Doc/OS/AII#ConfigurationofFilesystems:TheRecommendedWay layout descrption] now implements much stricter validation checks on the configuration. As a result, existing layout may need to be fixed. Two usual problems are: * The disk partition referred in a `device` or `devices` attribute is not on the right disk or is using a partition already used by another file system or block device. * The variable in your site layout equivalent to `DISK_GLITE_PARTS` in the [/wiki/Doc/OS/AII#CustomizingFileSystemList layout examples] is not containing an entry for the extented partition in fourth position but there is the need to create a logical partition to implement your layout. See the [/wiki/Doc/OS/AII#CustomizingFileSystemList documentation] for more information about producing a layout template. === gLite 3.2 requires a 64-bit OS === Machines hosting gLite services must be installed with a 64-bit OS. When upgrading from a previous gLite version, it is necessary to reinstall the machine if it was running a 32bit OS (OS upgrade from 32-bit to 64-bit is not supported). == Main Changes == ''Note : information in section, may refer to a not yet announced release. These informations are related to changes already present in the [source:templates/trunk development trunk] and about to be available or already in [source:templates/branches/gLite-3.2 gLite-3.2 branch].'' === gLite-3.2.0-5: AII Kickstart plugin default configuration improved === Release [milestone:gLite-3.2.0-5 3.2.0-5] of QWG Templates delivers AII Kickstart plugin (`aii-ks`) version 1.1.33. This new version contains signficant improvements to the default configuration of disks in Kickstart configuration file generated by `aii-shellfe` from a machine profile. See [/wiki/Doc/OS/AII#AIISiteConfiguration AII documentation] for details. === gLite-3.2.0-3: CREAM CE support added === Release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates introduces support of the CREAM CE. See the [/wiki/Doc/gLite/TemplateCustomization#CREAMConfig documentation] for more information about how to configure it with QWG Templates. During the initial installation, you may experience a few issues getting the CE to start. See [#KI3203CREAMCE for more information]. If you installed version 1.5 of the CREAM CE and would like to update it to version 1.6, be sure to read [#KI3203CREAM16 specific instructions] about manuel steps involved. === gLite-3.2.0-3: ncm-named v2 requires explicit definition of start property === Release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates delivers the last version of `ncm-named`, v2. This version is a complete rewrite of the previous version to fix several issues and unexpected behaviours. It is intended to be backward-compatible with one exception concerning the behaviour of the component if `start` property is undefined. In v1.x version it was `true` by default, with the potentially unexpected behaviour that `named` was started (with the existing configuration, whatever it was) as soon as the RPM was installed. `undefined` was considered the same as `false` and in this case, an enabled/running `named` was disabled/stopped. In the new version, it is `start` is `undefined` by default. But when `undefined`, `ncm-name` does nothing to a currently enabled/running `named` (managed by some other means). If you happen to rely on the old unexpected behaviour, please update your `ncm-named` configure to add: {{{ '/software/components/named/start' = true; }}} === gLite-3.2.0-3: VO configuration rewritten === #MC3204VOConfig Release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates introduces a new version of the functions used to configure VOs on a machine. This was necessary to overcome a few problems in the previous version, in particular in the ability to control the accounts created based on the FQAN enabled, and to improve the support for pool accounts for specific FQAN. See documentation on [/wiki/Doc/gLite/TemplateCustomization#VOConfiguration VO configuration] for more details. In addition, a change has been implemented in the UID allocation for specific FQANs. Previously, it was based only on the order of declaration in the `voms_roles` resource. In the new version, SW manager and production user are assigned a fixed UID, whatever their rank in the list: SW manager is assigned the first UID in the VO range, production user the second. This should reduce UID changes when there is a change in the list of roles declared. Unfortunatly, during the upgrade, a few VOs have some roles changed. This is mainly: `biomed`, `compchem`, `dzero`, `esr`, `fusion`, `flast.org`, `planck`. For these VOs the existing potentially conflicting accounts must be deleted. See [#KI3203VOConfig known issues] for more details on possible solutions. === gLite-3.2.0-3: VOBOX support === Release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates introduces support of [/wiki/Doc/gLite/TemplateCustomization#VOBOX VOBOX] machine type. Conversely to the support for VOBOX in gLite 3.1, in gLite 3.2 QWG templates configure both the machine and the VOBOX specific services, including the creation and configuration of the required accounts. [https://trac.lal.in2p3.fr/QWG/wiki/Doc/gLite/TemplateCustomization#VOBOX Documentation] has been updated. When upgrading from gLite 3.1, it may be necessary to edit the site-specific configuration of the VOBOX and to remove basically everything except the definition of the supported VO. Another change compared to gLite 3.1 support is that the number of VOs configured on the VOBOX is checked and an error is thrown if there is more than one VO configured (except operation VOs, `ops` by default). === gLite-3.2.0-3: Publishing GlueSite object in a non-standard BDII branch === Starting with release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates, it is now possible to publish the `GlueSite`object in a non-standard branch, rather than in `mds-vo-name=resource`. This is useful if you use several site BDIIs to increase service reliability. With the previous configuration, it was possible to have the `GlueSite` object published by several site BDIIs. This was harmless if the object was the same with the same DN. It was mainly a problem when using an internal hierarchy of site BDIIs (subsite BDIIs) inside the site. The new feature allows not to mix the GlueSite object with the other objects published by the resource BDII of the site BDII and thus to get it published to the top BDII only by the ''active'' site BDII (the one used by the DNS alias associated with the service). See [/wiki/Doc/gLite/TemplateCustomization#subsiteBDII BDII configuration] documentation for more details. === gLite-3.2.0-3: Profile cloning improved === Profile cloning, formerly known as ''dummy WN'', has been rewritten in release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates to improve its configuration flexibility and consistency. The configuration method is intended to be backward compatible and is [/wiki/Doc/gLite/WNCloning described] in gLite-related documentation. Examples have been added into examples provided with QWG templates. === gLite-3.2.0-3: New version of DPM/xrootd === #MC3203DPMxrootd A new version of xrootd for DPM, 20090729.0855-2, is provided starting with gLite update 06 in release [milestone:gLite-3.2.0-3 3.2.0-3] of QWG Templates. This new release fixes all the known issue with the previous, very old, release of xrootd shipped with DPM. Some configuration inconsistencies have been fixed in DPM/xrootd and this '''requires to remove''' the following options from your DPM configuration if they were defined: * /software/components/dpmlfc/options/dpm/xroot/config * /software/components/dpmlfc/options/dpm/xroot/ofsPlugin The default value for these options is now appropriate. Due to a daemon name change, the upgrade may involve a manual step. See [#KI3203DPMxrootd know problems]. === gLite-3.2.0-2: OS_VERSION_PARAMS added === Standard configuration of a gLite machine now defines as part of the OS selection a new variables `OS_VERSION_PARAMS` to ease further processing based on OS version (major or minor) or architecture. See [/wiki/Doc/OS/Customization OS configuration] for more details. === gLite-3.2.0-2: Torque directPaths don't depend on shared home directories === In previous QWG releases, `WN_SHARED_AREAS` entries were not configured as Torque direct paths (configured with $usecp directive in MOM client configuration) when the CE was not configured to use shared home directories. This has been fixed and now every entry in `WN_SHARED_AREAS` is configured as a Torque direct path unconditionally. Existing WN configuration may change but this should be harmless. === gLite-3.2.0-2: improved OS errata management === Release [milestone:gLite-3.2.0-7 3.2.0-7] of QWG Templates has an improved support for OS errata, offering an increased flexibility in large environments where you want to use a stage deployment. See the [/wiki/DOC/OS/Errata documentation] for more details. === gLite-3.2.0-2: disk partitionning enhancements and fixes === The template that allows configuration of disk partitions based on a [/wiki/Doc/OS/AII#ConfigurationofFilesystems:TheRecommendedWay layout template] has been fixed to handle properly configurations not based on LVM (extended partitions and software raid). It has also been improved to do much more validation checks on layout description and do the necessary renumbering of partitions to ensure numbers are consecutive and that partitions are created in the appropriate order (for example, partitions without an explicit size created last). [/wiki/Doc/OS/AII#CustomizingFileSystemList Layout examples] are now provided for the 3 major configurations: LVM-based, extended partitions and software raid. As a result of these stricter checks, your existing may not compile successfully anymore. See [#gLite-3.2.0-2:stricterchecksinfilesystemconfig.tplmayleadtocompilationerrors known problems]. === gLite 3.2.0-2: SL4 32-bit compatibility added to WN === SL4 32-bit compatibility as required by WLCG VOs (and probably many others with SL4 32-bit applications) have been added by default to gLite 3.2 WN. This can be disabled by defining variable `WN_WLCG_SL4_32BIT_COMPAT` to `false`. == Change Log == ChangeLog build from repository commit messages... [[ChangeLog(templates/branches/gLite-3.2,20)]]