= Release Notes for gLite3 Templates = [[TracNav]] [[TOC(inline)]] == QWG Releases == || Date || Release || Description || || 24/7/2006 || || Creation of branch gLite-3.0.0 [[BR]] || || 26/7/2006 || || [milestone:gLite-3.0.0-1 First release] of QWG templates for gLite 3.0.0 || || 26/7/2006 || || [milestone:gLite-3.0.0-2 Second release] of QWG templates for gLite 3.0.0 || || 29/7/2006 || || [milestone:gLite-3.0.0-3 Third release] of QWG templates for gLite 3.0.0 || || 17/8/2006 || || [milestone:gLite-3.0.0-4 Fourth release] of QWG templates for gLite 3.0 || || 18/8/2006 || || [milestone:gLite-3.0.0-5 Fith release] of QWG templates for gLite 3.0.0 || || 13/9/2006 || || [milestone:gLite-3.0.2-1 First release] of QWG templates for gLite 3.0.2 || || 15/9/2006 || || [milestone:gLite-3.0.2-2 Second release] of QWG templates for gLite 3.0.2 (CA 1.9) || || 20/10/2006 || || [milestone:gLite-3.0.2-3 Thrid release] of QWG templates for gLite 3.0.2 (CA 1.10, gLite update 7 (including critical security fixes) || || 7/12/2006 || || [milestone:gLite-3.0.2-4 Fourth release] of QWG templates for gLite 3.0.2 (gLite update 9, LRMS configuration) || || 19/12/2006 || || [milestone:gLite-3.0.2-5 Fifth release] of QWG templates for gLite 3.0.2 (gLite update 10, new VO configuration) || || 21/12/2006 || || [milestone:gLite-3.0.2-6 Sixth release] of QWG templates for gLite 3.0.2 (gLite update 11) || || 12/01/2007 || || [milestone:gLite-3.0.2-7 Seventh release] of QWG templates for gLite 3.0.2 (CA RPMs 1.11) || || 03/02/2007 || || [milestone:gLite-3.0.2-8 Eighth release] of QWG templates for gLite 3.0.2 (gLite 3.0 update 12) || || 16/02/2007 || || [milestone:gLite-3.0.2-9 Nineth release] of QWG templates for gLite 3.0.2 (gLite 3.0 update 13, CA RPMs 1.12) || == gLite Updates == QWG templates releases deliver the last gLite updates available at the time of the release. There is no equivalent between QWG release number (`-n`) and gLite update numbers. Sometimes one QWG templates release deliver several gLite updates. In each QWG release, there is a default associated gLite update (generally the last one). To allow site as much flexibility as possible in selecting what gLite update to install on a specific node or cluster, starting with QWG release 3.0.2-10, a release provides each gLite update separatly, not just the last one. You can select the gLite update you want to deploy by defining variable `GLITE_UPDATE_VERSION` at the beginning of your node profile, in `pro_site_cluster_info.tpl' or in gLite site parameters `pro_lcg2_config_site`. The value specified must be a string corresponding to a directory name in [source:templates/branches/gLite-3.0.0/grid/glite-3.0.0/update glite-3.0.0/update]. For exampple, QWG release 2.0.2-10 delivers update 18 as the default update. If you want to stay with update 15 on your DPM server, you may define the following variable in DPM server profile : {{{ variable GLITE_UPDATE_VERSION = '15'; }}} Content of gLite updates and associated release notes can be viewed at http://glite.web.cern.ch/glite/packages/R3.0/updates.asp. == Known Problems == === gLite-3.0.2-10 : DPM and LFC 1.6.3 upgrade === QWG templates release gLite-3.0.2-10 provides DPM and LFC version released as part of gLite update 16 (1.6.3). This version requires a schema upgrade for DPM and LFC databases. It is necessary to shutdown the services and run a script to achieve this. This requires careful planning : to avoid causing job failure during the upgrade, the CE must be closed and a schedule downtime must be defined in [https://goc.grid-support.ac.uk/gridsite/gocdb2/ GOC DB]. '''Note : be aware that doing an unplanned upgrade of DPM can result in database corruption.''' To allow more flexibility it is possible to deploy the QWG release on all nodes except DPM and LFC nodes by defining [http://glite.web.cern.ch/glite/packages/R3.0/updates.aspGLITE_UPDATE_VERSION] variable in the profile of these nodes. === gLite-3.0.2-10 : Torque/MAUI restart required on CE and WNs === QWG release gLite-3.0.2-10 delivers a new version of Torque/MAUI. This version is a fixed version of what was released in gLite update 16 and should be release in an upcoming gLite update. After installing QWG release gLite-3.0.2-10 with gLite update 16 or later (see [http://glite.web.cern.ch/glite/packages/R3.0/updates.asp above] for information on gLite update selection), you need to restart Torque/MAUI on CE and WNs. This involves : * Login in the CE, stop services `pbs_server` and `maui` (`maui` must generally be stopped with `kill -TERM`), start services `pbs_server` and `maui`. * Defining WN_LRMS_RESTART to force a Torque client restart on each WN. === gLite-3.0.2-9 requires panc >= 6.0.3 === As of QWG Templates release gLite-3.0.2-9, minimum required version of panc compiler is 6.0.3. === lfc/config.tpl compilation error === After installing QWG Templates release gLite-3.0.2-9, if you get an error compiling lfc/config.tpl, be sure to read section on [wiki:Doc/gLite/TemplateCustomization#LFCsiteparameters LFC site parameters]. This happened because there is no longer any password defaults provided. === Upgrading a LCG RB to update 13 and later === If you want to upgrade a LCG RB from gLite 3.0 <= update 12 to gLite 3.0 >= update 13 (corresponding to QWG templates release >= gLite-3.0.2-9), be sure to read the release notes. Because of an internal change, all unfinished jobs submitted through the RB will be forgot. Thus it is recommended to drain the RB at least 2 days before doing the upgrade. To drain a RB, the easiest is to stop the network server with the following command : {{{ service edg-wl-ns stop }}} When the RB is draining, no new job can be submitted and outpout of completed jobs cannot be retrieved. But users can get information about the status of their jobs. It is a good idea to stop the Quattor client on the RB during this period using command : {{{ service ncm-cidspd stop }}} === Update of voms.cern.ch certificate === Release 3.0.2-6 of QWG templates provides an updated version of vo/certs/cern-alt.tpl (certificate of voms.cern.ch)' named vo/certs/cern-alt.tpl.new, as provided by gLite 3.0 update 11. It cannot be activated right now as the certificate has not yet been updated on the server. When the server will have been updated (should happen 9/1/07), you'll have to replace current cern-alt.tpl with this new one by overwritting existing certificate and then deploy as usual. === AII : aii-shellfe error about bootloader === Release 3.0.2-9 of QWG templates introduces the support for explicit specification of the disk to use to install the boot loader. This is required for systems with a very large number of disks. Because of this change, this is necessary to update the Kickstart template you use. You can find an up to date working Kickstart template either in [source:AII/Templates QWG repository]. This template must be installed in directory point by `templatedir` in `/etc/aii-osinstall.conf` on your Quattor server (normally `/usr/lib/aii/osinstall). === AII : ncm-template required === Release 3.0.2-5 of QWG templates upgrades component `ncm-ncd`. The new version requires `ncm-template`. As this component is installed as part of Kickstart initial installation during post installation script, it is necessary to update the Kickstart configuration template. You can find a working Kickstart template either in [source:AII/Templates QWG repository]. This template must be installed in directory point by `templatedir` in `/etc/aii-osinstall.conf` on your Quattor server (normally `/usr/lib/aii/osinstall). === Change in how to run MPI jobs === MPI integration into middleware changed substancially in release 3.0.2-5 of QWG templates. These changes are the result of an effort to make the MPI integration more efficient, more flexible and... more stable. New design for MPI integration has been agreed upon by a large community in a meeting held in Dublin in December 2006. More information on how to use MPI in grid jobs is available at url http://grid.ie/mpi/wiki/FrontPage. === Shared working areas for MPI jobs (Torque v2) === Because EDG_WL_SCRATCH is defined unconditionally to the directory created by Torque on the worker node for the job, MPI jobs have no shared working areas even if home directories are shared. An attempt to fix this was made in 3.0.2-6 but broke the normal behaviour for non MPI jobs in shared home directories configurations (which is to have the working area on the WN local directory). Thus the change was reverted in 3.0.2-7. This problem should be fixed in 3.0.2-8. As a temporary workaround, you can keep `common/torque2/client/config` from 3.0.2-6 if it worked for you. === quattor/config not found === After upgrading to QWG template release gLite-3.0.2-5, if PAN compiler complains it cannot find `quattor/config`, you need to add `standard` before `standard/**/*` in your clusters `cluster.build.properties`. === LCMAPS error after upgrading from LCG 2.7.0 === This is caused by VOMS related libraries having been moved from `/opt/edg` to `/opt/glite`. `ncm-ldconf`, ran as part of the upgrade, is updating shared libraries cache (`/etc/ld.so.cache`) only if the contents of `/etc/ld.so.conf` has been changed. Unfortunatly this is not the case between LCG 2.7 and gLite 3.0. It just happens that some libraries have been moved from one path to another... To fix this problem, log on the machine and run : {{{ ldconfig }}} No service restart is needed. === DPM upgrade from LCG 2.6/2.7 === gLite 3.0 DPM (1.5) includes integration with VOMS and requires a database schema upgrade. This must be done manually on the DPM master node. The following steps are needed : * Create a script to call the upgrade procedure (replace by value for your sites) : {{{ #!/bin/sh requires () { echo "requires : nothing done" } # Edit to match your site export MY_DOMAIN='your.dom.ain' export DPM_HOST='dpm.your.dom.ain' export DPM_DB_USER='AdminDbUser' # Generally root export DPM_DB_PASSWORD='AdminDBPwd' . /opt/glite/yaim/functions/config_DPM_upgrade config_DPM_upgrade }}} * Check that AdminDBUSer/AdminDBPwd has full privileges on your database server * Run the script * Restart all DPM daemons. The easiest is to delete `/etc/shift.conf` and run the following command : {{{ ncm-ncd --configure dpmlfc }}} * Run the command in `/etc/cron.d/lcgdm-mapfile-update.ncm-cron.cron` === Condor RPM name not matching internal name === RB, VOBOX, WMS normally require Condor RPM `condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm`. Unfortunatly the internal name of this RPM is `condor-6.7.10-1.i386`. This doesn't work with SPMA that use internal name to know if a RPM is already installed. To workaround this problem, templates load `condor-6.7.10-1.i386.rpm`, which also exists (but seems different) in gLite 3.0 distribution (external packages). This requires the following step are required for loading the right RPM : * In RPM repository for gLite external packages, rename `condor-6.7.10-1.i386.rpm` to something else. * Create a symlink called `condor-6.7.10-1.i386.rpm` to `condor-6.7.10-linux-x86-glibc23-dynamic-1.i386.rpm`. This problem has been logged to GGUS, ticket 10567. === LCG RB upgrade === gLite 3.0 includes a new version of Condor that is no longer installed in `/opt/condor` but `/opt/condor-version`. Also default name for Condor configuration file is now `condor_config` instead of `condor.conf`. Condor relies on CONDORG_INSTALL_PATH and CONDOR_CONFIG environment variables to know where it is installed and where is the configuration file. Unfortunatly, the script starting Condor (`/etc/init.d/edg-wl-jc`) relies on `/opt/edg/etc/profile.d/edg-wl-config.sh` to get these variables defined in the context of the script (from `/etc/sysconfig/globus`, the actual place where they are defined). But this script doesn't take care of exporting these variables when tehy are defined in `/etc/sysconfig/globus`. As a consequence, Condor master doesn't see them. This has been logged into GGUS as ticket 10628. In the meantime, before the problem is fixed, you need a patched version of `edg-wl-config.sh`. It is provided as part of LCG RB configuration, by QWG templates. But there is no way to ensure that a further reinstallation of the RPM will not overwrite this patched version. If CondorG refuses to start, complaining that CONDOR_CONFIG is not defined, you should use the following command to reinstall the patched version : {{{ ncm-ncd --configure filecopy }}} === fetch-crl === gLite templates requires the most fetch-crl version released by EUGRID PMA laste spring 2007 (2.6.0-1). Before gLite-3.0.0-3, RPMs list provided only version 2.0-1 that is not working properly with the configuration set up by templates. As a result, you quickly reach expiration of CRL and nothing works anymore... Starting with gLite-3.0.0-3, RPMs list requires the right version. But this version is not yet part of gLite distribution, so you need to get it directly from [http://www.eugridpma.org/distribution/util/fetch-crl EUGRID PMA site] and put it the RPM repository for gLite 3.0 updates. === GlueHostApplicationSoftwareRunTimeEnvironment === This Glue attribute should normally contain a list of tags describing the software / middleware environment available on the CE. This list need to be updated with each new release of the middleware. Previously it was the responsability of the site to update variable `CE_RUNTIMEENV`. There is now a more flexible method described in [wiki:Doc/gLite/TemplateCustomization#CEConfiguration gLite3 customization] page. == Change Log == ChangeLog build from repository commit messages... [[ChangeLog(templates/branches/gLite-3.0.0,20)]]