Changes between Version 2 and Version 3 of Meetings/Workshops/20061018


Ignore:
Timestamp:
Oct 21, 2006, 11:37:51 PM (18 years ago)
Author:
/C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Meetings/Workshops/20061018

    v2 v3  
    1                 result['accounts']['users'][role_user] = merge(nlist('uid', vo_params[vo]['base_uid']+role_num,
    2                                                                      'groups', list(vo_group),
    3                                                                      'comment', 'VO '+vo_name+' '+role['description'],
    4                                                                      'createKeys', vo_params[vo]['create_keys'],
    5                                                                      'createHome', vo_params[vo]['create_home'],
    6                                                                      'poolDigits', vo_params[vo]['pool_digits'],
    7                                                                      ),
    8                                                                 role_user_home);
     1= Quattor Worshop - DESY - 18-20/10/06 =
     2[[TracNav]]
     3
     4[[TOC(inline)]]
     5
     6Agenda : https://indico.desy.de/conferenceTimeTable.py?confId=64&showDate=all&showSession=all&detailLevel=contribution&viewMode=parallel
     7
     8== Site Reports ==
     9
     10=== CERN ===
     11
     12Mainly increase in the number of nodes managed by Quattor : #4000
     13
     14Quattor Solaris support dropped
     15
     16CEN specific activities :
     17 * CDB profile authentication/encryption : not yet deployed, need to solve the issue of a host cert expiring or in a CRL to avoid dead lock. Idea is to use 2 URLs for downloading the config, with an automatic failover it first one fails.
     18 * Manage Xen and VirtualPC with Quattor
     19 * Namespaces : urgently need, want to agree on namespace layout before. Will take time to implement in 20K templates
     20 * test, new and pro area to be provided using loadpaths and ACLs
     21 * Integration of Quattor with SL (Service Level Status)
     22 * Impact of SL5 (using FC5/6) on ncm-components
     23 * SINDES maintenance : not really part of Quattor but mainly used in this context.
     24
     25
     26=== DESY ===
     27
     28200+ systems in production in the GRID infrastructure
     29 * Local HERA experiments
     30 * CMS and Atlas (DESY T2)
     31 * VOs that are part of EGGE
     32
     33Still using Quattor 1.1
     34 * CDB, AII, SWREP, SPMA
     35 * Still interest for SCDB but no time yet...
     36 * Middleware installation is done using YAIM (yaimexec ?)
     37
     38Current issues :
     39 * How to keep CA and middleware RPMs up to date ? How to use YAIM and SPMA in conjunction ?
     40 * PAN compile time increasing with the number of machines (currently 5-6 minutes for 200 nodes)
     41 * Time to get fixes for YAIM bugs reported in Savanah
     42 * Infrequent cdispd aborts : seems to be known at CERN, related to CCM
     43
     44
     45=== BEGrid ===
     46
     47Central SCDB + SWrep
     48 * SCDB : Certs+ACLs. Not everybody allowed to edit everything.
     49 * Goal :
     50
     51Current configuration :
     52 * 4 sites with LCG, 2 with gLite
     53 * Restarted from scratch (new repository) with gLite
     54
     55Additions to QWG templates :
     56 * SE_dCache
     57 * Lemon server with Oracle
     58 * Use of IPMI for /system/hardware
     59 * Ganglia server and client
     60 * All pwd and sensitive information into one template
     61
     62Problems using SLC with Quattor/QWG
     63
     64Tests done with WNs in VMware under WXP, managed by Quattor.
     65
     66Changes to AII :
     67 * SINDES/AII integration : some templates needed to be changed
     68 * Install kernel in ks rather than ks-post-install : easier to use alternative kernels
     69
     70Work on improving bulk compile of a large number of worker nodes
     71 * Compile a dummy WN
     72 * In real WNs, include the compiled profile and redo part of the configuration
     73 * Doesn't fit well with QWG templates, difficult to say wich part to reinclude and which part to ignore, probably need something inside PANC
     74
     75
     76=== CNAF ===
     77
     78Early adopter of Quattor : both experienced QWG templates and YAIM
     79
     80Currently using Quattor for initial installation (AII)
     81 * Use http repositories instead of SWrep
     82 * Since LCG 2.7, moved to YAIM with ncm-yaim
     83
     84Pretty well accepted by farm managers
     85 * Especially storage guys
     86
     87Started to implement our own namespaces
     88 * Mainly to achieve machine category segmentation, in particular to control access to templates.
     89 * Would like some discussion on use of namespaces in standard templates
     90
     91Would be useful to have all templates needed to install a basic SL system.
     92
     93More of documentation on basic Quattor components (PAN, LC libs...) would help dissemination.
     94
     95
     96=== NIKHEF ===
     97
     98T1 for LHC and involved in several national projects (BIG GRID, VL-e
     99
     1002 "sites" :
     101 * Production : ~180 nodes. Significant increase expected.
     102 * Installation testbed : ~15 nodes
     103
     104Quattor usage : CVS + ant/panc
     105 * OS : CentOS 3
     106 * panc 5.0.6
     107 * Only generic components used
     108
     109gLite : moved to ncm-yaim
     110 * Initial installation via Quattor
     111
     112Issues :
     113 * 64-bit SW installation
     114 * Compiler performance
     115
     116
     117=== Philips Eindhoven ===
     118
     119Part of research division uses grid
     120 * A few test systems installed with Quattor
     121 * Links with NIKHEF
     122
     123
     124=== PIC ===
     125
     126250 nodes running SL3 + 2 CASTOR nodes
     127
     128Use ncm-yaim.
     129 * Problem with the hardcoded list of variables that can be configured with ncm-yaim. Made some changes, not filled into Savanah. Pb fixed 6 months ago by Ronald.
     130
     131Deploying Quattor 1.2 + SCDB
     132
     133
     134=== Irish Grid ===
     135
     136Entire irish grid managed with Quattor
     137 * 18 sites, 200 nodes : all the nodes centrally managed from 1 site (Dublin)
     138 * 1 Quattor database
     139 * CVS SCDB, HTTP RPMs
     140 * 95% of nodes are Xen VMs
     141 * Had our own hierchical model : EGEE->GI->Site. Started to move to standard QWG layout/hierarchy soon with gLite 3.0.2.
     142 * Expertise is spreading throughout the group
     143
     144Moving non Grid servers to Quattor : 64-bit, SL4, Xen...
     145
     146Integration with automatic VM creation tool for building testbeds.
     147
     148Have spent a lot of time keeping up to date with changes in QWG structure.
     149
     150
     151=== UAM ===
     152
     153Quattor used for installation + configuration of 3 clusters, using 3 Quattor servers.
     154 * Use CDB with last QWG templates
     155
     156Cluster UAM-LCG2
     157 * Part of a distributed T2
     158 * 130 WNs
     159 * QWG LCG templates
     160 * Issue with update snc with other sites
     161
     162Cluster GVMUAM-LCG2
     163 * 500 nodes, mainly PCs for student lectures
     164 * Used for different topic
     165 * Installed with QWG templates
     166 * Must preserve existing partitions
     167 * No full control of DHCP
     168 * Network pretty slow
     169
     170Cluster WS
     171 * User desktops : #40
     172 * Template layout based on organization : departement, group...
     173 * Home made templates
     174 * Software for desktops
     175 * Components to configure desktop services like printers, X11
     176
     177
     178== Experience with ncm-yaim - R. Starink ==
     179
     180QWG templates until 2.6.0 : several difficulties, mainly due to the lack of genericity
     181 * Took 4-8 weeks to incorporate local changes to a new QWG release
     182 * Backward compatibility between release
     183 * Complex structure
     184
     185Move to YAIM with ncm-yaim with LCG 2.7
     186 * YAIM used only for config, install with SPMA
     187 * YAIM variables created from templates
     188 * Activate YAIM on each machine
     189
     190Setting YAIM variables into templates very similar to writing a pro_lcg2_config_site.tpl
     191 * Issue with new version of YAIM requiring variables not supported by ncm-yaim
     192 * Explicit list of supported variables comes from the ncm-yaim schema that bring the advantage of validation (instead of using a plain filecopy).
     193
     194Building RPMs list. Several solutions attempted :
     195 * Dependencies from gLite meta packages
     196 * From an APT repository
     197
     198Need for YAIM local functions required :
     199 * Security : no shared pool accounts for SGM users
     200 * Central db for DPM and LFC
     201 * Shared gridmapdir, shared home dirs, accounts on LDAP : not supported by YAIM
     202 * Developped nikhef-yaim-local to be installed (via SPMA) or updated after gLite-yaim. Close contacts with YAIM developpers.
     203
     204Satisfied by the change :
     205 * Easier to maintain, less overhead, less dependent on external party (QWG)
     206 * Still some surprise with YAIM...
     207 * Shorter time for deployment of a new release (1 week)
     208 * No experience in changing a node type without reinstalling
     209
     210
     211== RPM Dependency Hell - Stijn Weirdt ==
     212
     213RPM management by Quattor involves several steps :
     214 * Create/update repository with tools in cvs/utils
     215 * Create base repository contents with cvs/utils
     216 * Keep repository up to date
     217 * Test deployment : nothing to help here
     218
     219Other tools existing :
     220 * apt : no bi-arch support
     221 * yum : the best presently, not supporting all RPM options
     222 * smart : try to support everything, still buggy
     223 * All these tools share in common the idea of RPMs metadata stored in some db that can be accessed without accessing RPMs.
     224
     225SPMA : works well but some limitation : in particular inability to test deployement without installing RPMs as there is no metadata.
     226
     227RPM repositories :
     228 * SWrep : should have a "rsync url" option to keep local repository up to date.
     229 * OS distros : could use existing mirrors to avoid duplication locally. Would require some kind of metadata to produce the local templates required.
     230 * Initial loading of OS templates : may rely on comps.xml to find what is needed or on some metadata.
     231 * Keeping repository up to date : OS update metadata parsing ?
     232
     233RPM testing before deployment : test all the dependency from the administrator machine before deployment
     234 * Should be very fast (30-60s per machine) : require metadata
     235 * Fake installation not fast enough
     236 * Injection of all RPMs in a rpmdb + rpm -i
     237 * Problems with first tests on 64-bit : yum fails for unknown reasons, rpmdb doesn't support correctly bi-arch.
     238
     239
     240== Specific Use Cases ==
     241
     242=== Diskless Systems - M. Shroeder ===
     243
     2442 possible setups :
     245 * RAM disk : the whole system in a large file loaded at boot time
     246 * NFS mount : a small image loaded from network at boot, other FS through NFS, mainly readonly (and shareable)
     247
     248Read Hat's way
     249 * PXE + NFS mount
     250 * Clone the server system
     251 * One snapshot for each client : non shared files, writable files
     252
     253Quattor usage in this context :
     254 * Configure RH tools (pxeos, pseboot) via quattor templates and components : ncm-diskless_server
     255 * Kickstart for server install and cloning : only install base system
     256 * server and its clone configured separatly : chrooted for clients
     257 * Client configuration cloned in 2 parts : 1 common to all clients (done on the clone), 1 specific for each client (1 profile / client)
     258
     259Clone is not really a real machine : cannot receive CDB modifications notification
     260 * Has to fetch new profiles via cron
     261 * ncm components run on the server but must not impact the server
     262 * Client filesystem is read-only but to run a component on a client, need to create some files and the ability to modify existing ones
     263 * Not clear if we want to support several clones (several configurations) per server
     264
     265Current experience : 2 test clusters (2 and 8 clients)
     266 * Clients in a private network without access to CDB/SWrep
     267 * SPMA cannot be run on the client : establishing a matrix of components that can run on the clients
     268
     269
     270=== Quattor and XEN - S. Child ===
     271
     272Main problem is the grub component :
     273 * Need support for multiple boot
     274 * XEN is the kernel, Linux kernel and initrd are "modules"
     275 * New version with this support now checked in but problem found at CERN ?
     276
     277Started ncm-xen :
     278 * Write configuration files for individual VMs
     279 * Should also write base Xen configuration
     280 * Will set up links for automatic start of domains
     281 * Will check in 0.1 soon... Still not mature !
     282
     283GridBuilder : web based interface (Ajax based) for creating and managing VMs
     284 * http://gridbuilder.sourceforge.net, developper by Dublin Trinity College (Author : S. Childs)
     285 * LVM allows fast creation of COW FS images
     286 * Database of VMs and images
     287 * Quattor used for configuration. Still small amount of pre-configuration :
     288   * Configure network on filesystem images
     289   * Fetch Quattor profile
     290
     291Possible improvements in Quattor integration
     292 * Automatic generation of node profiles from user supplied description in GridBuilder
     293 * Support for Colinux, a version cooperation with Windows for sharing of HW ressources (memory...)
     294 * Condor pool
     295
     296
     297== PAN Compiler Update ==
     298
     299C version : implementation frozen, only major bugfixes, v6.0.3
     300 * Performance improvements in last version : compression removed, defaults processing improved, speed and memory consumption as good or better than before
     301 * Added "session" directory to improve interface for CDB
     302
     303Java version : still in development, all major parts functionning
     304 * Limited alpha available, first beta mid December, Production January
     305 * Main part missing : built-in functions
     306 * Validation suite is complete
     307 * License : probably Apache2 to be consistent with EGEE-II
     308 * Source in QWG SVN repository : https://svn.lal.in2p3.fr/LCG/QWG
     309 * Backward-compatibility : as much as possible, may be some incompatibilities for some very unused features
     310 * Require Java 1.5+
     311 * Compilation and packaging : ant
     312 * Parser (build) : JavaCC 4.0
     313 * Unit testing : JUnit 4.1
     314 * Base64 encoding/decoding (build & run) : classes available from Apache/W3C, probably incorporate them directly in the code base.
     315
     316Syntax changes :
     317 * Bit operators
     318 * Unary plus for symmetry with unary -
     319 * Octal, hex accepted everywhere : ranges, path...
     320 * Limits allowed on record statements
     321 * 'bind' statement added for binding a path to a type, in replacement of one form of 'type' (this 'type' usage will be deprecated)
     322 * 'return' allowed where functions are (not very useful, grammar simplification)
     323 * Warning could be issued for deprecated usage
     324
     325Other changes :
     326 * Stricter syntax checking at compile time
     327 * Generation of "object" files (binary form of a syntax checked template) to avoid recompilation of an unchanged template
     328 * ant tasks will be the primary interface to the compiler (no "binary") but wrapper scripts will be provided for command line
     329
     330Incompatibilities :
     331 * 'bind' is now a keyword
     332 * OBJECT, SELF, ARGV ARGC defined to conform to best practices for global varaible and avoid conflict (at grammar level) between object keyword and object variable
     333 * No pointers to properties : now 'x = y[0] = 0; y[0] = 1' now leaves x==0 (currently x is also set to 1)
     334
     335Emphasis for first release : verifying functionality and measuring performances
     336 * In particular evaluate cost/benefit for object files
     337
     338Future changes after initial release :
     339 * Parallelization : compilation of templates, building of configuration trees
     340 * Remove deprecated features : lowercase global variable, deprecated form of 'type', 'define' keyword
     341 * Addition of string functions : uppercasing, lowercasing, push, pop
     342 * More default types : XMLSchema, port...
     343
     344Missing features/pending bug reports :
     345 * Unescape strings in traceback produced in error msg
     346 * Add a file existence test operator
     347 * Add an argument to matches() to allow to pass global options (like in Perl)
     348
     349== QWG Templates ==
     350
     351From discussion : need to think about explicit support for CDB
     352 * Probably mainly the matter of defining load paths in an optional template in replacement for cluster.build.properties
     353
     354
     355== CDB/SCDB Update ==
     356
     357== CDB Update ==
     358
     359New features since last workshop :
     360 * Namespaces
     361 * X509 and Krb authentication
     362 * ACLs with namespace support
     363 * Client/server improvement through session metadata
     364
     365State management through metadata :
     366 * Problem was session directories used for both data and state
     367 * Clear separation with specific metadata for state control : better and earlier detection of commits
     368
     369Parallel compilation of templates :
     370 * All profiles compiled with one command doesn't scale : too long, to much memory
     371 * Whole set of templates divided into several subsets (without dependencies) compiled separatly and in parallel on several processor/machines
     372
     373Smarter rollback and commit :
     374  * Currently rolling requires rollback all the mods in the sesion, commit can screw up previous modifications committed but not in the session directory
     375 * Now allow selective rollback and interactive commit
     376
     377Handling of dead revisions : problem related to CVS backend
     378 * A removed template is no longer in CVS, restoring from backup requires a lot of manual cleanup. Look at SVN as a new backend ?
     379
     380New authentication for CDB moved to a separate library and now used in all components (SWrep in particular)
     381 * Doesn't require to install CDB to use other components, only the library
     382
     383Other project status :
     384 * CDB as a Web service : still any interest ? Not sure...
     385 * Fine-grained CDB locking with faire queuing : really required
     386 * Concurrent compilation of (non object) templates  : too complex, wait for new compiler..
     387 * mod_perl : no further investigation, mod_fastcgi probably a better solution.
     388
     389Open issues :
     390 * CVS desn't scale : a problem with 24k templates. Possible solutions : perl based CVS ? Subversion ? XML database ?
     391 * Relocatability : difficult to port to other systems, testing requires a full installation or specific privileges
     392
     393=== SCDB Update ===
     394
     395See slides.
     396
     397Suggestion :
     398 * To avoid a full rebuild after repository templates update, ignore these templates when evaluation if a node profile must be recompiled (this is how it is handled within CDB).
     399
     400
     401== Quattor Core Modules Update ==
     402
     403Several ongoing developpement at CERN, of general interest.
     404
     405CDB2SQL :
     406 * Rewrite with multithreaded Python and a fast XML parsing library. Sould have no CERN dependency, only Oracle dependency (but should not be difficult to add support for other RBMS)
     407
     408autoconf :
     409 *  In contact with ETICS to use their framework for configuration and automatic build
     410 * Will remain possible to run Quattor build tools outside of ETICS (mainly need to define --LOCALDIR)
     411
     412CCM :
     413 * Add support for a failover profile, in case the URL in --profile is not available
     414 * Problems observed causing ncm-cdispd to crash
     415
     416wassh2 : improvement of wassh (parallel ssh), interfacing with CDB
     417 * CDB plugin done using a plugin : CERN uses Oracle/cdb2sql
     418 * Will be part of Quattor 1.3
     419 * Non CERN testers welcome
     420
     421Notification systems : have the ability to trig execution of a NCM component on one or several nodes without logging to them (even with wassh)
     422 * Basically one command : notify_host myhost component
     423 * 'component' is a keyword translated on the target host via a configuration file
     424 * Current version with lots of CERN dependencies, plan to reengineer it and release it as part of Quattor
     425
     426
     427== AII ==
     428
     429Since last workshop, various bug fixes and some enhancements :
     430 * Support for rescue images
     431 * Support for alternative device naming schemes
     432
     433Work in progress :
     434 * Separate site configuration from component configuration
     435 * Error handling
     436 * Complete partitioning scheme
     437 * Documentation
     438 * Schema change for block devices
     439
     440On the todo list :
     441 * SPMA proxy start
     442 * Support for SINDES
     443
     444Separating site and component configuration :
     445 * Idea if to have pro_software/declaration_component_aii really only related to AII component. Will provide a function aiiconfig(name,disk) to return actual configuration
     446 * pro_config_aii_OSNAME : OS / arch specific configuration
     447 * pro_aii_config_site : site specific information
     448
     449Expected incompatibilities in new version :
     450 * Separation of configuration
     451 * Change in schema for generic block devices
     452
     453Remark : could add a property to /software/components/osinstall/options to select if initial installation is done with DHCP or final address and improve the template accordingly
     454
     455
     456== New Schema for Block Devices ===
     457
     458Need to add support for SW RAID, new kinds of block devices, filesystem mount options.
     459 * Also need to align naming for HW Raid and SW Raid
     460
     461New schema proposal : /system/blockdevices/[disk|md|lvm|hwraid]
     462 * disk : 1 entry per disk, almost all information optional, mainly partitions
     463 * md/hwraid : basically the same, add information about raid members, raid level, stripe size...
     464 * lvm : allow to use more sophisticated LVM scheme that current functions (LVM vg splitted over several HW raid...)
     465
     466Seems ok but need to check :
     467 * AII compatibility : in particular KS template
     468 * Ability to represent multi-pathed devices
     469 * Is the md/hwraid distinction relevant : may be just keep a property in a common schema
     470
     471== Namespaces ==
     472
     473See German presentation. Basically everybody agrees. Just a few details :
     474 * What's in pan/ and what's in quattor/
     475 * Have all type defined in one place as some will move to the compiler
     476 * Hardware : may be cards is not needed, for ram/ use bank.tpl
     477 * Components : upgrade quattor build tools to be able to automatically produce namespaced version from non namespaced source, call declaration.tpl schema.tpl and define default values there in the future, add the ability to build RPM-less components (automatically insert perl script into template).
     478
     479Clusters :
     480 * CDB relies on clusters and subclusters
     481 * Could be added in SCDB
     482 * Rename site/ to config/
     483 * Difficult to agree on the whole layout between CDB/SCDB as the concepts are different
     484
     485Sites :
     486 * Rename site/ to config/
     487 * Difficult to agree on the whole layout between CDB/SCDB as the concepts are different
     488
     489OS templates :
     490 * Change rpmlist to rpms
     491 * Rename templates describing groups to groupname.tpl
     492 * Renamme os/ namespace to config/
     493
     494Standard variables : no real need to agree on variables, need to agree only on the schema
     495
     496OS/arch naming : originally os_arch, QWG use os-arch, no real need to agree as this is not in the schema
     497
     498
     499== Wish List, Roadmap... ==
     500
     501Documentation :
     502 * Provide user guides in addition to specification for all components
     503 * Move quattor.org to Twiki (except home page ?)
     504 * Have a short installation guide (not 80 pages...)
     505 * List of available components, with a very short explanation and the recommended production version (may come from CVS tags, updated manually if necessary)
     506 * Tutorial : differenciate between old and recent tutorials
     507
     508Open Issues (from Savanah) :
     509
     510== Conclusions ==
     511
     512Next meeting : Trinity College, target date : mid-march
     513
     514
     515TBD :
     516 * Add an option to osinstall for using DHCP at installation time and merge LAL and standard KS template