wiki:ReleaseNotes/gLite-3.0

Version 39 (modified by /C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…, 17 years ago) (diff)

--

Release Notes for gLite3 Templates

Releases

Date Release Description
24/7/2006 Creation of branch gLite-3.0.0
26/7/2006 First release of QWG templates for gLite 3.0.0
26/7/2006 Second release of QWG templates for gLite 3.0.0
29/7/2006 Third release of QWG templates for gLite 3.0.0
17/8/2006 Fourth release of QWG templates for gLite 3.0
18/8/2006 Fith release of QWG templates for gLite 3.0.0
13/9/2006 First release of QWG templates for gLite 3.0.2
15/9/2006 Second release of QWG templates for gLite 3.0.2 (CA 1.9)
20/10/2006 Thrid release of QWG templates for gLite 3.0.2 (CA 1.10, gLite update 7 (including critical security fixes)
7/12/2006 Fourth release of QWG templates for gLite 3.0.2 (gLite update 9, LRMS configuration)
19/12/2006 Fifth release of QWG templates for gLite 3.0.2 (gLite update 10, new VO configuration)
21/12/2006 Sixth release of QWG templates for gLite 3.0.2 (gLite update 11)
12/01/2007 Seventh release of QWG templates for gLite 3.0.2 (CA RPMs 1.11)
03/02/2007 Eighth release of QWG templates for gLite 3.0.2 (gLite 3.0 update 12)

Known Problems

gLite-3.0.2-9 requires panc >= 6.0.3

As of QWG Templates release gLite-3.0.2-9, minimum required version of panc compiler is 6.0.3.

lfc/config.tpl compilation error

After installing QWG Templates release gLite-3.0.2-9, if you get an error compiling lfc/config.tpl, be sure to read section on LFC site parameters. This happened because there is no longer any password defaults provided.

Upgrading a LCG RB to update 13 and later

If you want to upgrade a LCG RB from gLite 3.0 <= update 12 to gLite 3.0 >= update 13 (corresponding to QWG templates release >= gLite-3.0.2-9), be sure to read the release notes. Because of an internal change, all unfinished jobs submitted through the RB will be forgot. Thus it is recommended to drain the RB at least 2 days before doing the upgrade.

To drain a RB, the easiest is to stop the network server with the following command :

service edg-wl-ns stop

When the RB is draining, no new job can be submitted and outpout of completed jobs cannot be retrieved. But users can get information about the status of their jobs.

It is a good idea to stop the Quattor client on the RB during this period using command :

service ncm-cidspd stop

Update of voms.cern.ch certificate

Release 3.0.2-6 of QWG templates provides an updated version of vo/certs/cern-alt.tpl (certificate of voms.cern.ch)' named vo/certs/cern-alt.tpl.new, as provided by gLite 3.0 update 11. It cannot be activated right now as the certificate has not yet been updated on the server.

When the server will have been updated (should happen 9/1/07), you'll have to replace current cern-alt.tpl with this new one by overwritting existing certificate and then deploy as usual.

AII : ncm-template required

Release 3.0.2-5 of QWG templates upgrades component ncm-ncd. The new version requires ncm-template.

As this component is installed as part of Kickstart initial installation during post installation script, it is necessary to update the Kickstart configuration template. You can find a working Kickstart template either in Quattor CVS or in SCDB src/aii directory. This template must be installed in directory point by templatedir in /etc/aii-osinstall.conf on your Quattor server (normally `/usr/lib/aii/osinstall).

Change in how to run MPI jobs

MPI integration into middleware changed substancially in release 3.0.2-5 of QWG templates. These changes are the result of an effort to make the MPI integration more efficient, more flexible and... more stable. New design for MPI integration has been agreed upon by a large community in a meeting held in Dublin in December 2006.

More information on how to use MPI in grid jobs is available at url http://grid.ie/mpi/wiki/FrontPage.

Shared working areas for MPI jobs (Torque v2)

Because EDG_WL_SCRATCH is defined unconditionally to the directory created by Torque on the worker node for the job, MPI jobs have no shared working areas even if home directories are shared. An attempt to fix this was made in 3.0.2-6 but broke the normal behaviour for non MPI jobs in shared home directories configurations (which is to have the working area on the WN local directory). Thus the change was reverted in 3.0.2-7.

This problem should be fixed in 3.0.2-8. As a temporary workaround, you can keep common/torque2/client/config from 3.0.2-6 if it worked for you.

quattor/config not found

After upgrading to QWG template release gLite-3.0.2-5, if PAN compiler complains it cannot find quattor/config, you need to add standard before standard/**/* in your clusters cluster.build.properties.

LCMAPS error after upgrading from LCG 2.7.0

This is caused by VOMS related libraries having been moved from /opt/edg to /opt/glite.

ncm-ldconf, ran as part of the upgrade, is updating shared libraries cache (/etc/ld.so.cache) only if the contents of /etc/ld.so.conf has been changed. Unfortunatly this is not the case between LCG 2.7 and gLite 3.0. It just happens that some libraries have been moved from one path to another...

To fix this problem, log on the machine and run :

ldconfig

No service restart is needed.

DPM upgrade from LCG 2.6/2.7

gLite 3.0 DPM (1.5) includes integration with VOMS and requires a database schema upgrade. This must be done manually on the DPM master node. The following steps are needed :

  • Create a script to call the upgrade procedure (replace by value for your sites) :
    #!/bin/sh
    
    requires () {
    echo "requires : nothing done"
    }
    
    # Edit to match your site
    export MY_DOMAIN='your.dom.ain'
    export DPM_HOST='dpm.your.dom.ain'
    export DPM_DB_USER='AdminDbUser'      # Generally root
    export DPM_DB_PASSWORD='AdminDBPwd'
    
    . /opt/glite/yaim/functions/config_DPM_upgrade
    
    config_DPM_upgrade
    
  • Check that AdminDBUSer/AdminDBPwd has full privileges on your database server
  • Run the script
  • Restart all DPM daemons. The easiest is to delete /etc/shift.conf and run the following command :
    ncm-ncd --configure dpmlfc
    
  • Run the command in /etc/cron.d/lcgdm-mapfile-update.ncm-cron.cron

Condor RPM name not matching internal name

RB, VOBOX, WMS normally require Condor RPM condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm. Unfortunatly the internal name of this RPM is condor-6.7.10-1.i386. This doesn't work with SPMA that use internal name to know if a RPM is already installed.

To workaround this problem, templates load condor-6.7.10-1.i386.rpm, which also exists (but seems different) in gLite 3.0 distribution (external packages). This requires the following step are required for loading the right RPM :

  • In RPM repository for gLite external packages, rename condor-6.7.10-1.i386.rpm to something else.
  • Create a symlink called condor-6.7.10-1.i386.rpm to condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm.

This problem has been logged to GGUS, ticket 10567.

LCG RB upgrade

gLite 3.0 includes a new version of Condor that is no longer installed in /opt/condor but /opt/condor-version. Also default name for Condor configuration file is now condor_config instead of condor.conf.

Condor relies on CONDORG_INSTALL_PATH and CONDOR_CONFIG environment variables to know where it is installed and where is the configuration file. Unfortunatly, the script starting Condor (/etc/init.d/edg-wl-jc) relies on /opt/edg/etc/profile.d/edg-wl-config.sh to get these variables defined in the context of the script (from /etc/sysconfig/globus, the actual place where they are defined). But this script doesn't take care of exporting these variables when tehy are defined in /etc/sysconfig/globus. As a consequence, Condor master doesn't see them. This has been logged into GGUS as ticket 10628.

In the meantime, before the problem is fixed, you need a patched version of edg-wl-config.sh. It is provided as part of LCG RB configuration, by QWG templates. But there is no way to ensure that a further reinstallation of the RPM will not overwrite this patched version.

If CondorG refuses to start, complaining that CONDOR_CONFIG is not defined, you should use the following command to reinstall the patched version :

ncm-ncd --configure filecopy

fetch-crl

gLite templates requires the most fetch-crl version released by EUGRID PMA laste spring 2007 (2.6.0-1). Before gLite-3.0.0-3, RPMs list provided only version 2.0-1 that is not working properly with the configuration set up by templates. As a result, you quickly reach expiration of CRL and nothing works anymore...

Starting with gLite-3.0.0-3, RPMs list requires the right version. But this version is not yet part of gLite distribution, so you need to get it directly from EUGRID PMA site and put it the RPM repository for gLite 3.0 updates.

GlueHostApplicationSoftwareRunTimeEnvironment

This Glue attribute should normally contain a list of tags describing the software / middleware environment available on the CE. This list need to be updated with each new release of the middleware. Previously it was the responsability of the site to update variable CE_RUNTIMEENV. There is now a more flexible method described in gLite3 customization page.

Change Log

ChangeLog build from repository commit messages...

Changelog not available