Changes between Initial Version and Version 1 of Meetings/Workshops/20070312


Ignore:
Timestamp:
Mar 13, 2007, 4:15:56 PM (19 years ago)
Author:
/C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Meetings/Workshops/20070312

    v1 v1  
     1= Quattor Workshop - Dublin - 13/3/07 =
     2[[TracNav]]
     3
     4[[TOC(inline)]]
     5
     6
     7== Site Reports ==
     8
     9=== CERN ===
     10
     112 instances :
     12 * Main for compute clusters (89) : #5000 nodes
     13 * Linux for controls
     14
     15Some experiments are setting up their own instances
     16
     17Desktops partially quattorized :
     18 * Installation with YUM
     19 * LCM used as a local interface to NCM
     20
     21MAin instance setup :
     22 * CDB : still based on flat pseudo namespace for custom templates
     23 * SPMA + SWrep with extended authentication capabilities (X509, Krb, NICE)
     24 * LCG through YAIM
     25
     26Quattor development activities :
     27 * 99% automated release process
     28 * Integration with ETICS to come soon
     29 * Thinking about better integration with Savanah (reports...)
     30 * Default templates set ready for namespaces
     31 * Core modules responsability : CCM, CDB, CDB2SQL, NCM framework, SPMA, SWrep, wash
     32
     33Secure profile transfer over SSL : put in production and working but had to remove it because of the load in case of bulk transfers
     34 * Waiting for new HW
     35
     36Xen based virtualization : started to look at Quattor integration with ncm-xen
     37
     38Miscellanous :
     39 * Integration of Quattor services with SLS
     40 * SL5 : no real interest from Linux support, on hold. May be resumed for supporting new desktops
     41
     42
     43=== DESY ===
     44
     45Quattor used to manage systems for D-GRID initiative :
     46 * #280 boxes : 250 WNs.
     47 * Still running Quattor 1.1 with AII, CDB, SPMA, YAIM
     48 * Home grown components for running YAIM and updating certificates
     49 * Web interface (DESY specific) developped for administration of systems with Quattor (e.g. install a new WN)
     50
     51D-GRID reference installation requires SUSE Linux on servers !
     52 * Implemented a very rudimentary integration to produce a profile allowing installation through AII
     53 * No plan to quattorize SUSE !
     54
     55Issues :
     56 * Compile time : what if we go to virtual machines and double the number of machines
     57 * How to improve errata handling ?
     58
     59=== Dublin ===
     60
     61Quattor configuration used :
     62 * Still a CVS repository with QWG + SCDB tools
     63 * Dublin fully controls grid systems at every site (no site customization)
     64 * One single repository for all the sites managed by Dublin, replicated SW repositories at sites
     65 * #357 machines
     66 * 3 Quattor deployment servers : production, testing, e-learning
     67
     68Integration of external clusters required Condor support as a LRMS
     69 * Added ability to select LRMS on a per queue basis inside a CE
     70
     71More systems moved to Quattor including SL4 and XEN
     72 * Moved to new QWG templates, structure stabilized (no change since October)
     73 * Released ncm-xen
     74
     75Looking at LAL monitoring through Nagios to detect deployment status.
     76
     77Issues :
     78 * How to agree on errata deployment ?
     79 * Need to cleanup local templates to be inline with QWG standards
     80
     81
     82=== BEgrid ===
     83
     84UGent stopped using Quattor because of severe issues with vserver (virtualization server)
     85 * Pb with SPMA not able to install all RPMs in one go, had to rerun it severaltimes
     86 * Want to get back
     87
     88QWG templates are going better and better
     89 * Still busy integrating local mods into default templates, when relevant
     90
     91Work in progress :
     92 * gLite integration (Shkelzen)
     93 * dCache integration
     94 * Monitoring template integration
     95 * Solaris support : quite some work needed on AII. Plan to use it for storage.
     96
     97
     98=== UAM ===
     99
     1002 grid clusters : UAM-LCG2 and GVMUAM-LCG2
     101 * SL 3.05 on WNs, SL 4.3 on other servers
     102 * Quattor release 1.2 with SCDB and QWG templates. Eclipe used for management
     103 * SE : dCache. Manually managed right now. Plan to use NCM components when ready.
     104
     1051 non grid cluster :
     106 * Managed with Quattor. Still using CDB.
     107 * Quattor server : 2 machines with one virtual IP. Currently no HW software, based on manual replication.
     108 * Many local customizations to template to match specific requirements
     109
     110New Quattor expert : Luis Munoz.
     111 * Currently working on AII developments and ncm-accesscontrol
     112
     113
     114=== CNAF ===
     115
     116Using Quattor 1.2 without QWG templates
     117 * Implemented their own solution for namespaces and template access control
     118
     119Future :
     120 * Upgrade to latest Quattor version : main problem is manpower
     121 * Migration to new namespace solution : difficult process for storage nodes
     122 * New blade farm to be installed soon : requires fixing handling of megaraid device in AII
     123 * Xen under study
     124
     125
     126=== GRIF ===
     127
     128See agenda
     129
     130=== Missing Sites ===
     131
     132NIKHEF : Ronald unable to attend this meeting. Still using Quattor with YAIM
     133
     134PCI : no news but seems to be still using Quattor.
     135
     136
     137== PAN Compiler - Cal ==
     138
     139Current status :
     140 * panc 6.0.3 : production version, completely frozen, mno-threaded
     141 * panc v7.1.x : beta (production@GRIF), feature frozen, multi-threaded
     142
     143New compiler nearly 100% backwards-compatible
     144 * 'final' keyword not enforced in structure templates
     145 * Additional operators and functions : bitwise operators, if_exists, is_number...
     146 * New function to force traceback at any point
     147 * Support for 'bind' statement to replace 'type path = type' (grammar simplification)
     148 * Ability to compress profiles and pre-compile individual pan template files (but seems to provide no perf gain in current implementation)
     149 * Deprecation option : display messages about deprecated features used in current version (for the next version or next 2 versions).
     150 * Properties no longer duplicated but shared between objects : reduces memory consumption. No change in behaviour at user level.
     151 * Ready for production (extensive testing done with test units and at GRIF)
     152
     153Performances :
     154 * On single-CPU machine, comparable to panc v6
     155 * Significant improvement on multi-CPU machines but scaling currently far from linear. Built-in multi-thread features with 3 threads per CPUs allocated by default.
     156 * Need to search "tuning" space : many parameters that can be tuned with Java (memory, threads...)
     157 * Currently not scaling linearly with the number of cores : need more investigations to understand where the problems are (panc, OS, HW, Java VM...)
     158 * May look at building a compiler farm using Java integrated web services
     159
     160Checks required moving to production :
     161 * Memory consumption acceptable for large builds (in particular the scalability with the number of RPMs in the repositories)
     162 * Command line scripts is compatible with CDB
     163
     164Issues :
     165 * UTF-8 processing by downstream Quattor components : need to discuss what the best solution to limit the impact on components without loosing advantage of UTF-8 support.
     166
     167Language and compiler documentation and evolution :
     168 * Description moved to LCGQWG wiki
     169 * Minor version without language changes
     170 * Major version involve language changes : planned changes target grammar simplification and comilation speed-up.
     171   * v8 : suppress 'define' and 'delete' keywords
     172   * v9 : switch to uppercase automatic variables
     173   * v10 : allow only 'include {dml};'
     174
     175Source and binary distribution :
     176 * Sources in a SVN repository (LCGQWG)
     177 * Binary RPMs built and distributed through ETICS
     178
     179Future compiler optimization :
     180 * "Machine code" reduction/optimization : nothing done in current version, learnt from previous versions that this can lead to significant speed-up.
     181 * Function optimization : built implementation of list/nlist functions
     182 * Automatic caching of invariant sub-trees : not very easy to do when using lot of global variables as switches (like in QWG templates).
     183
     184
     185== CDB - ME Poleggi ==
     186
     187Mainly minor improvements and bug fixes since last meeting.
     188
     189Still on todo list :
     190 * Fine-grained CDB locking with fair queuing
     191 * Common authentication service
     192
     193Future ideas :
     194 * Make dependency information available to user interfaces for processing by GUI like pangraph
     195 * get rid of internal Config lib in favour of AppConfig::File (CCM affected too)
     196 * Look at lowering SOAP requirements for memory
     197
     198
     199== SCDB Status - Michel Jouvin ==
     200
     201See agenda.
     202
     203
     204== Core Components Status - ME Poleggi ==
     205
     206Done :
     207 * Support for namespaces in default templates, with ability to convert old templates. How and when to obsolete old ones ?
     208    * ncmtplconfvert does the conversion, called by 'make tplconvert'
     209 * SWRepSOAP with some enhancements
     210 * getRecHash() function for converting config tree to PErl data structure
     211 * cdispd : PID registered in a file, log rotation
     212 * ncm-ncd : ccm-fetch information in log files, suppress unnecessary locking
     213 * ncm-templates : new tags supported (LFOR)
     214
     215In progress :
     216 * CCM to accept non local profiles (as done with AII)
     217 * Release process integration with ETICS
     218 * SPMA/rpmt support for checking signatures : remains a problem with rpmt-py
     219 * New partition scheme
     220 * Release 1.3 : namespaces, cumulated fixes and core modules and components enhancements
     221 * Notification system : ability to notify a set of nodes for a set of task to run. First version in CVS, not functionnal yes. With contribution from BARC
     222
     223Still pending :
     224 * CDB : controlling template naming per namespace
     225 * CDB2SQL : direct back-end interfacing, like XML DB, to avoid reparsing XML profiles
     226 * panc : template where only key/value are possible that can be used as input by other templates. CERN request for easier reading/writing by other apps.
     227 * New indirection level for device identification to tackle with device name change when installing a new OS release (e.g. 2.4 to 2.6)
     228
     229CERN manpower very limited : 1,4 FTE
     230 * Includes 1 FTE contributed by CNAF and 0,2 by BARC
     231
     232New component source structure : do it asap
     233 * Keep the current TPL/ directory as the place for .tpl.cin files (don't replicated installation structure)
     234 * Convert current .tpl.cin structure to namespace and new names
     235 * Increase the minor version for the component
     236
     237AII : move configuration information from /software/components/aii to /system/aii and related templates from components/ to quattor/ namespace
     238 * Can be done asap or later depending on manpower
     239
     240SPMA configuration : may move /software/packages and /software/repositories to /software/components/spma
     241 * When done, would probably be no longer need for /software and we could move /software/components to /components
     242 * Let's discuss more and take a decision at next meeting
     243
     244Delay Quattor 1.3 until end of March to integrate :
     245 * Component sources conversion to namespaces
     246 * Moving AII configuration information in the config tree
     247
     248wassh : parallel execution engine on top of ssh
     249 * Now part of Quattor, will be in 1.3
     250 * Support multiple clusters and sub-clusters : plugin to resolve clusters. Need a volunteer for writing a XMLDB back-end
     251 * Useful as manual notification tool
     252
     253pangraph : graph representation of Pan templates. Produces a .png file.
     254 * Should handle any type of templates : use RE to match and parse the inclusion lines in the template
     255 * Doesn't handle templates accessed by variables
     256 * Integration with Lemon's XML parser
     257 * Could probably be improved if there was an ability to produce the include graph in panc (inclusion graph is internally build in v7)
     258
     259
     260== QWG Templates - M. Jouvin ==
     261
     262See agenda.
     263
     264
     265=== gLite Configuration - Shkelzen Rugovac ===
     266
     267Status :
     268 * WMS : working, including VO configuration. WMS and WMSLB can be configured on separate machines but need to be tested.
     269 * gCE : still work in progress
     270
     271RPMs list produced by script.
     272
     273YAIM used as a source for configuration information.
     274
     275Needed to adapt ncm-glite (committed to CVS)
     276 * nostart/noconfigure/noconfigfile lists
     277 * Change order of configure/start sequence : do configure/start for each service instead of all configure, then all start.
     278
     279WMSLB :
     280 * Volist : redundant with the standard VO config. May be need a function to convert.
     281
     282gCE :
     283 * Should be ready next week...
     284
     285
     286== AII - Luis Munoz ==
     287
     288Current status :
     289 * Several long lasting bugs and feature requests closed
     290 * Notification in case of success, not only errors
     291 * Namespace templates
     292 * Kickstart template refactoring
     293
     294Working on improving and updating documenation (man page) to have a full coverage of options.
     295
     296Pending problems :
     297 * AII templates schema less : easy to make mistakes detected only when deploying
     298 * Software raids
     299
     300Future plans :
     301 * Define all the possible options explicitly in AII schema.tpl
     302 * Remove the use of Kickstart templates and ncm-templates processor in favor of direct use of NCM::Component. Try to use ncm-ncd rather that tweaking it, adding ability to use remote profiles (with an explicit URL).
     303 * No firm date, hopefully before next workshop
     304
     305Not sure that a Kickstart template can be replaced by a generator that uses only the machine profile. Would probably be better to add to the template processor the ability to call an external function for complex things like producing partitions.
     306 * Already in use at CERN
     307
     308
     309== Release Management - ME Poleggi ==
     310
     311Goal is fully automized process with single configuration point and simple procedures accessed via a unique interface.
     312
     313Several actors :
     314 * Contributors : responsible of their own modules. Should keep up to date official status information.
     315 * Uers : in the feedback loop about problem/fix status but should not decide a component status
     316 * Release manager : call for release phases, in charge of collecting, building, packaging.
     317
     318Tools :
     319 * TWiki : release procedure documentation, official current status page, minimum developper's convention page (coding standards, CVS handling, Savannah usage), templates for extra documentation (e.g. man pages).
     320
     321 * CVS :
     322   * need for a code maturity indicator ? how to implement it : branches ?
     323   * stricter commit pre-requisites based on checking config.mk contents
     324 * make : the user interface through several targets.
     325 * Savannah :
     326   * Stricter bug workflow (leads to extra load and emails...), fixed mapping between CVS directories and component field. 'Ready for test' should not change the person it is assigned to.
     327   * Probably better workflow documentation could help
     328   * Automatic extraction of change log information should be possible but neither time, nor expertise to do it.
     329
     330Release procedure : 3-step process
     331 * PREPARING  : 10-15 days for code (feature) freeze
     332 * Beta : code frozen
     333 * Release : tag in CVS
     334
     335Status page :
     336 * Meant to be a unique official reference
     337 * Each developper is responsible of a small part
     338 * Primary source for the list of candidate packages composing the next release
     339 * Minimum dynamic usage : release status, components status/maintainer
     340 * Possible add-ons : links to release notes, links to Savanah items... but links clog TWiki code !!!
     341
     342Current release process status : 99% automated procedure
     343 * Mainly global ChangeLog needs to be produced manually
     344 * Parallel build on different platform (via different checkouts in AFS)
     345 * Building time is currently 2 hours (not including the time to produce documentation...)
     346
     347Open issues :
     348 * Orphaned modules and components : who wants to adopt them ? What about getting rid of MAINTAINER file in favour of a tag in config.mk ?
     349 * ETICS integration : project area is ready. Still to decide how to enhance the current build framework.
     350 * Test framework : all core modules should have test units, all components should go through a template syntax check
     351 * Release updates could be managed as full releases : currently they are not going through the same formal process.
     352
     353Cal remarks :
     354 * CVS branching add too much complication. Not a good idea.
     355 * TWiki is not good for generated information. Need to put it somewhere else (Quattor site ?) and have links to it.
     356
     357Status tag for component version : add a 'make stable' that will move stable tag to last version or a specific version.
     358 * Use this stable tag to produce the status page.
     359 * 'Note' column in status page should come from config.mk
     360 * Add an OBSOLETED_BY tag to config.mk
     361
     362Packaging :
     363 * Review the metapackages available
     364 * Do a nigthly build of stable version RPMs and build a RPM repository. Should be provided by ETICS framework.
     365
     366
     367== XEN support - S. Childs ==
     368
     369Grid Ireland has deployed service nodes on Xen since Jan. 2005.
     370 * configuration of Xen VMs largely manual
     371 * Would like management through Quattor to help creating on demand VMs
     372
     373Issues :
     374 * AII integration to allow Kickstart installation of Xen VMs
     375 * ncm-xen : Quattor component for generating guest VM configurations
     376 * ncm-grub : support for Xen multi-boot
     377
     378AII integration :
     379 * 2 VM types : PVM (Para-virtualized, no net card and unable to PXE), HVM (Hardware, fully, virtualized, PXE capable)
     380 * Simplest option is to have host do PXE on behalft of guest VM using pypxeboot
     381 * Installation requires a patched Anaconda for Xen : available from CERN for SL(C?)3/4 but not integrated in standard SL(C) distribution
     382
     383pypxeboot : Python script invoked by Xen during VM boot acting as a Xen "bootload"
     384 * Retrieves kernel and config for the VM
     385 * Uses a customized udhcpc to get IP address for guest's MAC address (MAC address added at VM creation time)
     386 * tftp client used to retrieve pxelinux config, kernel and initrd based on VM IP address
     387
     388ncm-xen : handle VM configuration and creation
     389 * Creates VM configuration file
     390 * Schema allows to describe VM HW config (memory, disks, interfaces) boot information
     391 * May probably be retrieved from machine profile for the machine hosting VMs using 'value('//VMHostProfile/path')
     392 * Disks : support both file based disk and dedicated partitions. Create LVM volumes if needed. Would be easier if ncm-lvm component was available (one being written at CERN ?)
     393 * Launch installation of VM through AII (the machine hosting VMs may be installed with AII first)
     394 * First serious use for Grid-Ireland to start soon
     395
     396
     397== NCM Components - G. Cancio ==
     398
     399Virtual dependencies to allow to express a dependency based on a feature rather than an implementation
     400 * Typical use case is package updater : allow to depend on package updater rather than spma if people wants to use other package updaters
     401
     402Meta-components : not really useful. Equivalent is ncm-ncd --all or --component
     403
     404ncm-ncd --noaction : currently call all components but not enforced by components
     405 * Require a component to export 'NOACTION_SUPPORTED' to be called in no-action mode
     406
     407Add a property (/software/components/cdispd) to disable cdispd on a particular node without shutting it down.
     408
     409ncm-cdispd pre dependency ordering : have a global flag to select between current behaviour and a relaxed mode where dependencies are run only if their configuration changes.
     410
     411Component development guidelines :
     412 * Move current component writing guide (and related) to TWiki
     413 * Document configuration options handled by the component
     414 * Version number for stable release should be >= 1.0
     415
     416New NVA API to load configuration in a Perl hash :
     417 * Recommend as the default
     418 * Load boolean properties as Perl booleans, nlist as hash and list as arrays
     419 * Rename as getTree()
     420
     421
     422== ncm-access_control replacement - L. Munoz ==
     423
     424ncm-access_control :
     425 * Control users authorized to login, through with credential
     426 * sudo control
     427
     428Current ncm-access_control is ugly and full of CERN specific features
     429 * Does too many unrelated things
     430
     431Proposed splitting into several different components :
     432 * ncm-useraccess : control user access to a node (.klogin, SSH public keys), ACLs on services, implement the concept of role (same configuration shared by several users)
     433 * ncm-sudo : edits the /etc/sudoers, taking full control of it (except non-scalar variable like lists)
     434
     435ncm-useraccess needs minor review :
     436 * roles defined in user resource must be added to role members defined with the role
     437 * SSHKey should be renamed SSHKeyURL and support for specifying the SSK key inside the profile should be added
     438 * Limit duplication of information (krb4/krb5) when possible
     439
     440
     441== Dissemination ==
     442
     443=== Usenix ===
     444
     445Usenix paper submission : must be done before May 14th
     446 * Final paper : 16 pages max, including figures...
     447 * Draft paper to be submitted in May. Need to be almost complete.
     448
     449Need to concentrate on :
     450 * Administration paradygm description : declarative model. Reward LCFG for introducing ideas
     451 * Illustrate rather than describe PAN powerfulness
     452 * Short description of main components
     453 * Description of some use cases, with a focus on distributed sites management. Should include simple site and desktop management as use cases. Don't insist too much about grid.
     454
     455Must include a short review/comparaison with other products we know.
     456
     457Insist we share a lot of components and configuration information even if the use cases are different.
     458 * QWG templates as a proof of concept, even if grid is not the main focus for the paper
     459
     460Mention on going work to integrate VM management.
     461
     462Try to position ourselves against new management "standards" like WSDM, MUSE.
     463
     464Try to get "external" opinion : e.g. person from Philips who attended the first workshop (contact : Ronald).
     465
     466Mention future work and directions :
     467 * Workflows (moving from a test config to production...)
     468 * Porting
     469
     470=== Others ===
     471
     472CHEP : submit a talk proposal about QWG templates and distributed management
     473
     474Web site : need to be refurbished. Both content and presentation.
     475 * Need to find somebody (probably outside CERN) who could contribute. LAL ?
     476 * First need to decide the appropriate structure and tools
     477
     478
     479== Conclusions ==
     480
     481Next meeting in Madrid around mid-October
     482