wiki:ReleaseNotes/gLite-3.0

Version 80 (modified by /C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…, 17 years ago) (diff)

--

Release Notes for gLite3 Templates

Table of Contents

  1. QWG Releases
  2. gLite Updates
  3. Main Changes and Known Problems
    1. gLite-3.0.2-12 : DPM/LFC 1.6.4 issues with non VOMS proxies
    2. gLite-3.0.2-12 : DPM/LFC 1.6.4 configuration changes
    3. gLite-3.0.2-12 : default update set to 25
    4. gLite-3.0.2-12 : new BDII configuration options
    5. gLite-3.0.2-12 : VOMS groups/roles mapping changes
    6. gLite-3.0.2-11 : upgrade to panc v7 recommended
    7. gLite-3.0.2-11 : panc v6 restrictions
    8. gLite-3.0.2-11 : gLite update 24/25 doesn't work
    9. gLite-3.0.2-11 : new names of gLite repository templates
    10. gLite-3.0.2-11 : SE_HOSTS format change
    11. gLite-3.0.2-11 : migration to namespaced components and standard templates
    12. gLite-3.0.2-10 : SE_HOST_DEFAULT_SC3 required with panc v6
    13. gLite-3.0.2-10 : DPM and LFC 1.6.3 upgrade
    14. gLite-3.0.2-10 : Torque/MAUI restart required on CE and WNs
    15. gLite-3.0.2-10 : SE removed from NFS exports
    16. gLite-3.0.2-10 : SE_HOST_DEFAULT deprecated
    17. gLite-3.0.2-10 : SEDPM_DISK_HOSTS no longer used
    18. gLite-3.0.2-10 : DPM : support added for SRM v2.2
    19. gLite-3.0.2-10 : pro_software_component_dpmlfc included in standard …
    20. gLite-3.0.2-9 requires panc >= 6.0.3
    21. lfc/config.tpl compilation error
    22. Upgrading a LCG RB to update 13 and later
    23. Update of voms.cern.ch certificate
    24. AII : aii-shellfe error about bootloader
    25. AII : ncm-template required
    26. Change in how to run MPI jobs
    27. Shared working areas for MPI jobs (Torque v2)
    28. quattor/config not found
    29. LCMAPS error after upgrading from LCG 2.7.0
    30. DPM upgrade from LCG 2.6/2.7
    31. Condor RPM name not matching internal name
    32. LCG RB upgrade
    33. fetch-crl
    34. GlueHostApplicationSoftwareRunTimeEnvironment
  4. Change Log

Note : Information in this page, particularly Known Problems section, may refer to a not yet announced release. These information are related to an upcoming release and documents things already in the gLite-3.0.0 branch.

QWG Releases

Note : you can have a look at ongoing developments and progress of upcoming release through Roadmap button, last entries in the gLite-3.0.0 branch ChangeLog or full log of trunk branch.'

Date Release Description
24/7/2006 Creation of branch gLite-3.0.0
26/7/2006 First release of QWG templates for gLite 3.0.0
26/7/2006 Second release of QWG templates for gLite 3.0.0
29/7/2006 Third release of QWG templates for gLite 3.0.0
17/8/2006 Fourth release of QWG templates for gLite 3.0
18/8/2006 Fith release of QWG templates for gLite 3.0.0
13/9/2006 First release of QWG templates for gLite 3.0.2
15/9/2006 Second release of QWG templates for gLite 3.0.2 (CA 1.9)
20/10/2006 Thrid release of QWG templates for gLite 3.0.2 (CA 1.10, gLite update 7 (including critical security fixes)
7/12/2006 Fourth release of QWG templates for gLite 3.0.2 (gLite update 9, LRMS configuration)
19/12/2006 Fifth release of QWG templates for gLite 3.0.2 (gLite update 10, new VO configuration)
21/12/2006 Sixth release of QWG templates for gLite 3.0.2 (gLite update 11)
12/01/2007 Seventh release of QWG templates for gLite 3.0.2 (CA RPMs 1.11)
03/02/2007 Eighth release of QWG templates for gLite 3.0.2 (gLite 3.0 update 12)
16/02/2007 Nineth release of QWG templates for gLite 3.0.2 (gLite 3.0 update 13, CA RPMs 1.12)
25/03/2007 Tenth release of QWG templates for gLite 3.0.2 (gLite 3.0 update 14-18, CA RPMs 1.13)
21/05/2007 Eleventh release of QWG templates for gLite 3.0.2 (gLite 3.0 update 19-24, dCache support)

gLite Updates

QWG templates releases deliver the last gLite updates available at the time of the release. There is no equivalent between QWG release number (-n) and gLite update numbers. Sometimes one QWG templates release deliver several gLite updates. In each QWG release, there is a default associated gLite update (generally the last one).

Starting with QWG release 3.0.2-10, QWG releases provide a standard mechanism for selecting the gLite update you want to deploy on a per node, per cluster or per site basis.

For exampple, QWG release 2.0.2-10 delivers update 18 as the default update. If you want to stay with update 15 on your DPM server, you may define the following variable in DPM server profile :

variable GLITE_UPDATE_VERSION = '15';

Content of gLite updates and associated release notes can be viewed at http://glite.web.cern.ch/glite/packages/R3.0/updates.asp.

Main Changes and Known Problems

gLite-3.0.2-12 : DPM/LFC 1.6.4 issues with non VOMS proxies

There is a potential issue with DPM/LFC 1.6.4 if a user has a proxy without VOMS extensions. Look at http://glite.web.cern.ch/glite/packages/R3.0/updates.asp (section about DPM/LFC 1.6.4) for more information on the workaround, if you need it.

gLite-3.0.2-12 : DPM/LFC 1.6.4 configuration changes

If you install update 24 or later, you'll get DPM/LFC 1.6.4 installed. This new version requires a database schema upgrade. To complete this upgrade, you need to :

  1. Install the new version ; currently running daemons will be unaffected.
  2. On DPM head node, stop your DPM daemons.
  3. Backup your current database with mysql-dump.
  4. Run the script to upgrade database schema, /opt/lcg/share/DPM/dpm-secondary-groups
  5. Restart your DPM daemons
  6. Restart daemons on DPM disk servers

This upgrade implies a DPM downtime during 10-15 minues. It is recommended to declare a downtime of your CE during this period.

DPM 1.6.4 is using BDII instead of Globus MDS to publish information into the BDII. After the upgrade, you need to change BDII_URL in your pro_lcg2_config_site.tpl for the upgraded SE. The changes required are :

  • Port number is BDII port (2170) instead of MDS port (2135)
  • DN base is mds-vo-name=resource,o=grid instead of mds-vo-name=local,o=grid.

gLite-3.0.2-12 : default update set to 25

Default gLite update has been set to 25. If upgrading from a previous version of QWG templates, be aware that this implies a DPM upgrade that requires a schema change. Be sure to stop your DPM server before updating QWG templates or to define GLITE_UPDATE_VERSION in DPM nodes profiles to the currently running version to prevent a DPM upgrade during QWG templates update.

gLite-3.0.2-12 : new BDII configuration options

BDII configuration has been enhanced to add flexibility (in particular support for Freedom of Choice) and support new resource BDII (replacement for Globus MDS). New configuration is backward compatible. To take advantage of new options, look at BDII configuration documentation.

gLite-3.0.2-12 : VOMS groups/roles mapping changes

Previously, mapping of VOMS roles was described in variable voms_roles of VO parameters. The role to map was given as a string in name key. To allow a more flexible mapping, this key can now be a list and each value can be a simple value interpreted as a role name or a group/role specification using '/GROUP=.../ROLE=...'. This fixes a problem with LHCb Software Manager. In addition, a few variable names have been changed : the old variable name is still used if the new one is not present. Look at documentation about gLite templates customization for more details about changes in variable names.

In addition, previous behaviour of mapping users to their role in grid-mapfile has been changed. By default, a user is always mapped as a normal user in grid-mapfile (grid-mapfile is used only if the user has no VOMS extensions in his proxy). Thus, to be mapped to an account corresponding to a specific role (e.g. SW manager), the user has to get a proxy using voms-proxy-init --voms. To revert to the previous behaviour, you need to define variable VO_GRIDMAPFILE_MAP_VOMS_ROLES to true in your machine profile or a site specific template.

gLite-3.0.2-11 : upgrade to panc v7 recommended

gLite-3.0.2-11 is the last version to support panc v6. See related note about restrictions.

New version of the QWG templates will begin to take advantage of new features introduced in panc v7. Even if not required to use QWG templates release gLite-3.0.2-11, you are advised to upgrade to panc v7 after upgrading the QWG templates, in ordre to prepare for future releases.

If you are using SCDB, just update to last version of SCDB Tools : panc v7 is the default compiler since SCDB Tools v2. Follow instructions about upgrading SCDB.

If you are not using SCDB, follow instruction about installing PAN Compiler.

gLite-3.0.2-11 : panc v6 restrictions

There are a few restrictions if you want to use panc v6 with QWG templates gLite-3.0.2-11 :

  • SE_HOST_DEFAULT_SC3 variable must be define in pro_lcg2_config_site.tpl, even if it is deprecated and no longer used by anybody.
  • At least one of the VO you are supporting must have a SW area defined.
  • SE_HOSTS must be defined even if there is no SE in your configuration. In this case, define as an empty nlist :
    variable SE_HOSTS = nlist();
    

gLite-3.0.2-11 : gLite update 24/25 doesn't work

gLite update 24 is available in QWG release gLite-3.0.2-11 but is not the default update. It was discovered after the release that its support in this release is broken (dependency issues for WN, misconfiguration for DPM). Should you need to install this update before next release, be sure to use at least r1828 of gLite-3.0.0. Also, be aware that the new DPM version require a database schema change and some changes to site BDII configuration.

Note : The high priority part of update 24, the new certificate of lcg-voms.cern.ch, is part of gLite-3.0.2-11 independently of the actual update installed. You are advised to install gLite-3.0.2-11 before expiration of lcg-voms.cern.ch certificate, May 29th.

gLite-3.0.2-11 : new names of gLite repository templates

repository/glite.tpl shipped with gLite-3.0.2-11 uses namespaces to access RPM repository templates. The repository templates have been renamed without the repository_lal_ prefix.

If you want to ignore this change, you can just revert repository/glite.tpl to version supplied with previous version of QWG templates. This is recommended during the upgrade to 3.0.2-11.. In this case, you probably need to edit your previous template and replace the line include pro_declaration_functions_general; by :

include pan/functions;

To update your configuration to use namespaced templates for repositories, you first need to upgrade SCDB Tools to 2.1.2 or later. After, execute the following steps :

  • Rename your repository templates (probably in your site template hierarchy).
  • Remove everything after the initial comments and execute ant update.rep.templates.
  • Check and if necessary update cluster.build.properties for each of your clusters : be sure to have the namespace form of your site directory in the include path. Look at cluster example.

If you are using SWrep to manage repositories and repository templates, upgrade to a version support repository namespace (part of Quattor 1.3).

For more information about repository templates used by the default repository/glite.tpl, look at gLite templates customization.

Note : if, after updating your repository templates to use repository namespace, you get an error in SPMA functions, look at SCDB release notes.

gLite-3.0.2-11 : SE_HOSTS format change

SE_HOSTS variable format has changed. Previously, it used to be a list of SE host names with several "companion" variables (SE_TYPES, SE_ARCH, SE_ACCESS, STORAGE_DIRS). This was quite hard to maintain in sync.

SE_HOSTS is now a nlist with one entry per SE. The key is the SE host name, the value is a nlist describing SE parameters. Look at gLite templates customization for more details.

Old format is still accepted but you are advised to update your site configuration and change SE_HOSTS to conform to new format. All the previous SE_xxx variables can be removed. As part of this change, you may have to update how BDII_URLS is built in your pro_lcg2_config_site.tpl if you use the suggested loop over SE_HOSTS. Previously the SE host name was the value (third parameter from first()/next() functions), now this is the key (second parameter). Look at example for more information.

gLite-3.0.2-11 : migration to namespaced components and standard templates

QWG templates release gLite-3.0.2-11 introduces migration of PAN/Quattor standard templates and component templates to namespaced version. Namespaces are a PAN feature allowing a better organization of templates and improving ability to easily locate where a template sits.

This migration implies changes in name of templates. This can lead to some backward compabilitity problem. In order to minimize the impact on site specific templates, templates with the previous name are still maintained as wrappers to new templates. To keep the distribution as clean as possible, these templates are not part of the release but can be downloaded from QWG repository trunk.

If after installing this release, you cannot compile because some templates are missing, you are advised to fix them in order to use new names. If this is not possible immediatly, you can download the compatibility templates and install them in your site or cluster directory.

For components, the rule to convert from old name to new name is the following :

  • pro_software_component_xxx becomes components/xxx/config
  • pro_declaration_component_xxx becomes components/xxx/schema
  • pro_declaration_functions_xxx becomes components/xxx/functions

Look at the compatibility templates to find the exact new name.

As a consequence of the migration of these templates to namespace, you can probably clean up cluster.build.properties in your clusters. For gLite template hierarchy the only required elements in include path are :

grid/glite-3.0.0 grid/glite-3.0.0/components

Look at cluster example for more information.

gLite-3.0.2-10 : SE_HOST_DEFAULT_SC3 required with panc v6

In QWG templates release gLite-3.0.2-10, SE_HOST_DEFAULT_SC3 has been made optional. Unfortunalty, this cause a problem if you are using PAN Compiler v6. There are 2 possible workarounds :

  • Define this variable to your SE. This will have no impact as this is not used anymore by anypart of the middleware.
  • Upgrade to PAN Compiler v7. If your are using SCDB, upgrade SCDB Tools to last version.

gLite-3.0.2-10 : DPM and LFC 1.6.3 upgrade

QWG templates release gLite-3.0.2-10 provides DPM and LFC version released as part of gLite update 16 (1.6.3). This version requires a schema upgrade for DPM and LFC databases. It is necessary to shutdown the services and run a the YAIM script to achieve this (/opt/glite/yaim/functions/config_DPM_upgrade for DPM or /opt/glite/yaim/functions/config_lfc_upgrade for LFC) or follow the instructions at https://twiki.cern.ch/twiki/bin/view/LCG/DpmSrmv2Support. This requires careful planning : to avoid causing job failure during the upgrade, the CE must be closed and a schedule downtime must be defined in GOC DB.

Note : be aware that doing an unplanned upgrade of DPM can result in database corruption.

To allow more flexibility it is possible to deploy the QWG release on all nodes except DPM and LFC nodes by defining GLITE_UPDATE_VERSION variable in the profile of these nodes.

gLite-3.0.2-10 : Torque/MAUI restart required on CE and WNs

QWG release gLite-3.0.2-10 delivers a new version of Torque/MAUI. This version is a fixed version of what was released in gLite update 16 and should be release in an upcoming gLite update.

After installing QWG release gLite-3.0.2-10 with gLite update 16 or later (see above for information on gLite update selection), you need to restart Torque/MAUI on CE and WNs. This involves :

  • Login in the CE, stop services pbs_server and maui (maui must generally be stopped with kill -TERM), start services pbs_server and maui.
  • Defining LRMS_CLIENT_RESTART to force a Torque client restart on each WN.

gLite-3.0.2-10 : SE removed from NFS exports

Until QWG release gLite-3.0.2-10, default export list for NFS served file systems contained an entry for the default SE. This has been changed. The only node added to the export list by default is the CE. All others must be added using variable SITE_WN_HOSTS whose value is typically a regexp matching name of nodes requiring access to the NFS file system. See site parameters example.

gLite-3.0.2-10 : SE_HOST_DEFAULT deprecated

To allow greater flexibility in definition of close and default SE, SE_HOST_DEFAULT variable has been replaced by 2 variables supporting per VO definitions. Look at section on SE configuration for more details.

For backward compatibility, if SE_HOST_DEFAULT variable is present and new variables are not defined, its value is used for both close and default SE of all VOs.

gLite-3.0.2-10 : SEDPM_DISK_HOSTS no longer used

SEDPM_DISK_HOSTS was used to configure GridIce on DPM disk servers. GridIce configuration is now based on DPM configuration. This variable is no longer used and can be safely removed.

gLite-3.0.2-10 : DPM : support added for SRM v2.2

QWG Templates and ncm-dpmlfc have been updated to allow management of DPM SRM v2.2 service. To enable it, you need to edit your DPM site configuration template pointed by variable SEDPM_CONFIG_SITE and add an entry for SRM v2.2 service, similar to the entry for SRM v2. Look at example of DPM site configuration template.

gLite-3.0.2-10 : pro_software_component_dpmlfc included in standard DPM configuration

pro_software_component_dpmlfc is now included as part of the standard DPM configuration, before including the template defining DPM site configuration. Thus, this is no longer necessary to include it in the template defining the local DPM configuration.

It is recommended that you edit your template defining DPM site configuration to suppress include of pro_software_component_dpmlfc, as the name of this template will change in a future release as a consequence of conversion to namespace.

gLite-3.0.2-9 requires panc >= 6.0.3

As of QWG Templates release gLite-3.0.2-9, minimum required version of panc compiler is 6.0.3.

lfc/config.tpl compilation error

After installing QWG Templates release gLite-3.0.2-9, if you get an error compiling lfc/config.tpl, be sure to read section on LFC site parameters. This happened because there is no longer any password defaults provided.

Upgrading a LCG RB to update 13 and later

If you want to upgrade a LCG RB from gLite 3.0 <= update 12 to gLite 3.0 >= update 13 (corresponding to QWG templates release >= gLite-3.0.2-9), be sure to read the release notes. Because of an internal change, all unfinished jobs submitted through the RB will be forgot. Thus it is recommended to drain the RB at least 2 days before doing the upgrade.

To drain a RB, the easiest is to stop the network server with the following command :

service edg-wl-ns stop

When the RB is draining, no new job can be submitted and outpout of completed jobs cannot be retrieved. But users can get information about the status of their jobs.

It is a good idea to stop the Quattor client on the RB during this period using command :

service ncm-cidspd stop

Update of voms.cern.ch certificate

Release 3.0.2-6 of QWG templates provides an updated version of vo/certs/cern-alt.tpl (certificate of voms.cern.ch)' named vo/certs/cern-alt.tpl.new, as provided by gLite 3.0 update 11. It cannot be activated right now as the certificate has not yet been updated on the server.

When the server will have been updated (should happen 9/1/07), you'll have to replace current cern-alt.tpl with this new one by overwritting existing certificate and then deploy as usual.

AII : aii-shellfe error about bootloader

Release 3.0.2-9 of QWG templates introduces the support for explicit specification of the disk to use to install the boot loader. This is required for systems with a very large number of disks.

Because of this change, this is necessary to update the Kickstart template you use. You can find an up to date working Kickstart template either in QWG repository. This template must be installed in directory point by templatedir in /etc/aii-osinstall.conf on your Quattor server (normally `/usr/lib/aii/osinstall).

AII : ncm-template required

Release 3.0.2-5 of QWG templates upgrades component ncm-ncd. The new version requires ncm-template.

As this component is installed as part of Kickstart initial installation during post installation script, it is necessary to update the Kickstart configuration template. You can find a working Kickstart template either in QWG repository. This template must be installed in directory point by templatedir in /etc/aii-osinstall.conf on your Quattor server (normally `/usr/lib/aii/osinstall).

Change in how to run MPI jobs

MPI integration into middleware changed substancially in release 3.0.2-5 of QWG templates. These changes are the result of an effort to make the MPI integration more efficient, more flexible and... more stable. New design for MPI integration has been agreed upon by a large community in a meeting held in Dublin in December 2006.

More information on how to use MPI in grid jobs is available at url http://grid.ie/mpi/wiki/FrontPage.

Shared working areas for MPI jobs (Torque v2)

Because EDG_WL_SCRATCH is defined unconditionally to the directory created by Torque on the worker node for the job, MPI jobs have no shared working areas even if home directories are shared. An attempt to fix this was made in 3.0.2-6 but broke the normal behaviour for non MPI jobs in shared home directories configurations (which is to have the working area on the WN local directory). Thus the change was reverted in 3.0.2-7.

This problem should be fixed in 3.0.2-8. As a temporary workaround, you can keep common/torque2/client/config from 3.0.2-6 if it worked for you.

quattor/config not found

After upgrading to QWG template release gLite-3.0.2-5, if PAN compiler complains it cannot find quattor/config, you need to add standard before standard/**/* in your clusters cluster.build.properties.

LCMAPS error after upgrading from LCG 2.7.0

This is caused by VOMS related libraries having been moved from /opt/edg to /opt/glite.

ncm-ldconf, ran as part of the upgrade, is updating shared libraries cache (/etc/ld.so.cache) only if the contents of /etc/ld.so.conf has been changed. Unfortunatly this is not the case between LCG 2.7 and gLite 3.0. It just happens that some libraries have been moved from one path to another...

To fix this problem, log on the machine and run :

ldconfig

No service restart is needed.

DPM upgrade from LCG 2.6/2.7

gLite 3.0 DPM (1.5) includes integration with VOMS and requires a database schema upgrade. This must be done manually on the DPM master node. The following steps are needed :

  • Create a script to call the upgrade procedure (replace by value for your sites) :
    #!/bin/sh
    
    requires () {
    echo "requires : nothing done"
    }
    
    # Edit to match your site
    export MY_DOMAIN='your.dom.ain'
    export DPM_HOST='dpm.your.dom.ain'
    export DPM_DB_USER='AdminDbUser'      # Generally root
    export DPM_DB_PASSWORD='AdminDBPwd'
    
    . /opt/glite/yaim/functions/config_DPM_upgrade
    
    config_DPM_upgrade
    
  • Check that AdminDBUSer/AdminDBPwd has full privileges on your database server
  • Run the script
  • Restart all DPM daemons. The easiest is to delete /etc/shift.conf and run the following command :
    ncm-ncd --configure dpmlfc
    
  • Run the command in /etc/cron.d/lcgdm-mapfile-update.ncm-cron.cron

Condor RPM name not matching internal name

RB, VOBOX, WMS normally require Condor RPM condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm. Unfortunatly the internal name of this RPM is condor-6.7.10-1.i386. This doesn't work with SPMA that use internal name to know if a RPM is already installed.

To workaround this problem, templates load condor-6.7.10-1.i386.rpm, which also exists (but seems different) in gLite 3.0 distribution (external packages). This requires the following step are required for loading the right RPM :

  • In RPM repository for gLite external packages, rename condor-6.7.10-1.i386.rpm to something else.
  • Create a symlink called condor-6.7.10-1.i386.rpm to condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm.

This problem has been logged to GGUS, ticket 10567.

LCG RB upgrade

gLite 3.0 includes a new version of Condor that is no longer installed in /opt/condor but /opt/condor-version. Also default name for Condor configuration file is now condor_config instead of condor.conf.

Condor relies on CONDORG_INSTALL_PATH and CONDOR_CONFIG environment variables to know where it is installed and where is the configuration file. Unfortunatly, the script starting Condor (/etc/init.d/edg-wl-jc) relies on /opt/edg/etc/profile.d/edg-wl-config.sh to get these variables defined in the context of the script (from /etc/sysconfig/globus, the actual place where they are defined). But this script doesn't take care of exporting these variables when tehy are defined in /etc/sysconfig/globus. As a consequence, Condor master doesn't see them. This has been logged into GGUS as ticket 10628.

In the meantime, before the problem is fixed, you need a patched version of edg-wl-config.sh. It is provided as part of LCG RB configuration, by QWG templates. But there is no way to ensure that a further reinstallation of the RPM will not overwrite this patched version.

If CondorG refuses to start, complaining that CONDOR_CONFIG is not defined, you should use the following command to reinstall the patched version :

ncm-ncd --configure filecopy

fetch-crl

gLite templates requires the most fetch-crl version released by EUGRID PMA laste spring 2007 (2.6.0-1). Before gLite-3.0.0-3, RPMs list provided only version 2.0-1 that is not working properly with the configuration set up by templates. As a result, you quickly reach expiration of CRL and nothing works anymore...

Starting with gLite-3.0.0-3, RPMs list requires the right version. But this version is not yet part of gLite distribution, so you need to get it directly from EUGRID PMA site and put it the RPM repository for gLite 3.0 updates.

GlueHostApplicationSoftwareRunTimeEnvironment

This Glue attribute should normally contain a list of tags describing the software / middleware environment available on the CE. This list need to be updated with each new release of the middleware. Previously it was the responsability of the site to update variable CE_RUNTIMEENV. There is now a more flexible method described in gLite3 customization page.

Change Log

ChangeLog build from repository commit messages...

Changelog not available