Changes between Initial Version and Version 1 of Meetings/Workshops/20060322


Ignore:
Timestamp:
Jul 14, 2006, 7:12:09 PM (18 years ago)
Author:
/C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Meetings/Workshops/20060322

    v1 v1  
     1= Quattor Workshop - LAL - 22-24/3/06 =
     2[[TracNav]]
     3
     4http://agenda.lal.in2p3.fr/fullAgenda.php?ida=a0627
     5
     6[[TOC(inline)]]
     7
     8
     9== Quattor Usage Survey - G. Cancio ==
     10
     11Huge increase of Quattor usage in one year. Now 39 sites with 18 CDB instances (3 testing)
     12 * More than 6500 nodes managed by Quattor
     13 * 2 sites outsite Europe
     14
     15CDB instances size :
     16 * 2 sites > 1000 nodes
     17 * 7 sites ~100 nodes
     18 * 9 sites ~10 nodes
     19
     20Mainly running SL3/i386. Interest growing in SL4 and x86_64
     21
     22Subsystem usage :
     23 * AII : all instances but CERN
     24 * SPMA : all instances (but some using YUM too)
     25 * SWrep : 13
     26 * CDB : 11, SCDB : 5
     27
     28LCG configurations :
     29 * QWG only : 6
     30 * YAIM only : 3
     31 * Combination : 5
     32
     33Lemon : 5 in prod + 3 testing
     34
     35Common local extensions :
     36 * Web interfaces
     37 * Local integration scripts
     38 * Local NCM components (in particular at CERN)
     39
     40Most common problems / improvement requests :
     41 * Namespace for CDB
     42 * Panc speedup
     43 * LCG template complexity
     44 * Template/profile browsing tools
     45
     46
     47== Site Reports ==
     48
     49=== BEgrid - Stijnde Weirdt ===
     50
     51BEgrid : 5 sites managed independently, mainly with Quattor
     52 * Most people doing the same thing
     53
     54Proposed solution :
     55 * One central server based on SCDB + certs
     56 * SWrep for RPMs repositories
     57 * Use of http cache
     58
     59Don't plan for central administration. Administrators produced tagged version. Site administrators choose to deploy them or not.
     60
     61Problem of sensible data : no standard encryption in pan, thus using dummy values in the central config and updating it on deployment at local sites.
     62
     63Currently one site running the configuration and another one joined last week.
     64
     65Success depends on people willing to contribute and the ability to keep template structure clean
     66 * Is it really easier ? Certainly for newcomers…
     67
     68 
     69=== CERN - G. Cancio ===
     70
     71Quattor used to manage > 3800 nodes (not including desktops)
     72 * Special justification needed for not having a system quattor managed. Resistance decreasing
     73
     74CDB/CDBSOAP
     75 * Slithly differenc core template set (older than public release)
     76 * YAIM for LCG install
     77 * Desktops have a local, common XML profile (not in central CDB)
     78 * SPMA/SWrep except for desktop where YUM is used for greater user flexibility
     79 * CDB2SQL backend heavily used
     80
     81Problems encountered :
     82 * Need for higher level workflow apps able to interact with CDB (LEAF/HMS) and connection with ticket system, Web interfaces to display CDB summaries
     83 * CERN-CC relies on CDB as a central database for HW description. CDB2SQL doesn't fulfil all requirements (how to store information about a non machine element, like a rack).
     84 * Deployment of scurity updates is painful with explicit package lists. Developped a script to generate package list out of a YUM repository. But never tested outside CERN and doesn't do anything about dependencies management (need to be done manually)
     85
     86Major developments :
     87 * ACL support for templates
     88 * Support for namespaces
     89 * Soap layer improvement for better perfs and improvement of concurrent usage (can reach 30)
     90 * CDB backend : rewrite in progress in multithreaded python
     91 * SOAP version of SWrep (using same auth as in CDB). Collaboration with BARC (India).
     92 * Rpmt-py in replacement of rpmt : more fault tolerant, better integration with Yum.
     93 * CERN manpower limited : 1.4 FTE (including 1 from INFN)
     94
     95Quattor release cycle : need to provide an easy to use tool to build a new release with all components.
     96 * 1.2 planned as soon as CDB namespaces are ready
     97
     98PAN compiler :
     99 * Better control of what can be redefined in a template
     100 * Easier programmatic IF by adding a new 'data' type template
     101 * Direct back end interfacing lile XML DB or Oracle
     102
     103
     104=== Linux for Control Machines at CERN - M. Shroder ===
     105
     106Very specific issues :
     107 * Limited network connectivity
     108 * Don't want a 'deploy everything' approach
     109 * Fine control on who can modify what
     110 * Diskless system : spma and ncm cannot run on clients. Must be run on the boot server.
     111
     112Lack of dependency management by the compiler is an issue as you detect problems when you run SPMA.
     113
     114Need for a GUI to navigate templates.
     115
     116Need a tools to sync several CDBs instances : separate instances required to avoid interference between different clusters.
     117
     118Need a tool to easily check what version is used for a given product.
     119
     120
     121=== CNAF - A. Chierici ===
     122
     123Quattor used for everything (LCG and non LCG system)
     124 * Use AII, SPMA, CDB but not SWrep
     125 * ~1000 nodes, only SLC3. Looking at SLC4
     126 * Still some concerns about using Quattor to manage specific servers (mail servers, castor servers…)
     127 * More documentation needed to help storage specialist (or other services) to better understand how they can use Quattor with the flexibility required (installation of specific drivers..).
     128
     129Thinking about using Quattor to manager T2s in Italy.
     130
     131Used QWG in 2.6. Will probably move to YAIM for 2.7
     132 * Need to install ig-lcg, Italian customization of LCG
     133 * Need to have direct support and suggestions by CERN experts for solving tricky issues.
     134 * QWG template maintenance too dependent on Cal availability
     135
     136AII : implemented a -rescue option to run a rescue environment on failing nodes
     137 * Runs as a Linux-in-RAM, allow for memcheck and disk formatting
     138 * Use a specific PXE config file, rescue.cfg
     139
     140Requests for the future :
     141 * A well defined roadmap
     142 * Better integration with tools like Lemon
     143 * More components : filecopy is the most used with pros and cons… (e.g. ssh component not managing ssh_config)
     144
     145
     146=== DESY - U. Ensslin ===
     147
     148Quattor used differently on WNs and other nodes :
     149 * WNs fully installed with Quattor + QWG templates
     150 * Other nodes : Quattor for OS install only, YAIM for LCG services
     151 * Currently ~150 nodes managed by Quattor (Hamburg only)
     152
     1531 SWrep + several "Quattor instances" (CDB/AII) used for testing
     154
     155CDB CVS stored in AFS to ease coordination between developers.
     156 * Template naming scheme modified to allow for DESY specific version of templates.
     157 * Developped a tool to display pan template include hierarchy (lth). Doesn't handle templates created by create() function.
     158
     159Developped a Web based interface to install machine without having to know about templates and support bulk installation
     160
     161Some template/component modifications :
     162 * executecmd : allow to execute a command on a node in a pre configured list
     163 * ncm-afsclt and ncm-sshd improved
     164 * Not yet committed to repository…
     165
     166Requests for the future :
     167 * Erratas management
     168 * Panc compile time
     169 * QWG templates too complex. MW people looking for simpler solutions : YAIM is ok for them. Need some help on how to use ncm-yaim.
     170 * Partitioning issues
     171 * Quattor for Solaris
     172
     173Template management remains difficult :
     174 * Many sources of templates
     175 * Many releases for MW, OS… more and more template version. How not to get lost ?
     176
     177
     178=== GridIreland - S. Childs ===
     179
     18018 sites spread around Ireland managed centrally.
     181 * ~190 nodes
     182
     183Quattor configuration
     184 * No CDB : CVS + ant
     185 * No SWrep : resync http servers, substitution of server names at local sites
     186 * Wide usage of VMs (Xen) : ~95% of service nodes
     187 * Hierarchichal config : EGEE->GridIreland->Site
     188 * Implemented ability to compile one specific site
     189
     190Minor issues :
     191 * Compilation speed
     192 * Need components to manage VMs
     193 * Sometimes cumbersome command syntax
     194 * Improvement needed in the profile modification notification process
     195
     196
     197=== NIKHEF - R. Starink ===
     198
     199NIKHEF + SARA are T1s. Also support for national projects.
     200 * ~150 nodes in production, ~15 for testbed
     201 * Non standard environment : CentOS
     202
     203Quattor configuration :
     204 * No CDB : CVS + ant
     205 * AII, NCM, SPMA
     206
     207LCG2 :
     208 * Installation done with SPMA
     209 * Configuration : migrating from QWG templates (+ local modifications) to ncm-yaim (+ local modifications)
     210
     211Successes : SW version management + automated installation and configuration. Works well.
     212
     213Problems :
     214 * Upgrading usually full of surprises
     215 * LCG2 templates : complex structure, many local modifications to standard profiles, backward compatibility, insufficient ChangeLogs…
     216 * Tools : lack of management of dependencies, profile comparaison
     217
     218
     219== Quattor for managing large mail systems - J. Gato ==
     220
     221A large mail system characterized by a large number of machine running a lot of different services (SMTP, LDAP, Db, Web…)
     222
     223Andago involved in deploying and maintaining such systems for customers. Many duplication of effort, problem to maintain consistency. Started TOMAS (Quattor+Nagios+FailTolerance+OSCng..)
     224 * Quattor to be used for package deployment and configuration
     225 * Interested by Quattor supporting other distritution (and alternative packagers)
     226 * Open source (exact license not yet decided)
     227
     228Andago plans to develop components for several services.
     229 * Many have to be written from scratch (http, anti-spam, MySQL…)
     230
     2311st release of TOMAS planned 31st of May. http://tomas.andago.com (jgato@andago.com)
     232
     233
     234== CDB / SCDB Status and Directions ==
     235
     236=== CDB Evolutions - M.E. Poleggi ===
     237
     238CDB  : 3 tiers architecture
     239 * SOAP client :cdbop interactive/batch shell
     240 * SOAP middleware : Apache + cdbsoap CGI
     241 * CDB backend : CVS
     242
     243Template flat space : class distinction only possible through template names, complex administration with several clusters, no easy way to limit the scope of user actions.
     244 * Namespace should allow to solve this
     245
     246Authentication problem : weak security due to ciphered password
     247 * Pwd stored in CDB server filesystem
     248 * Non interactive clients need to store clear text pwd
     249 * Possible solutions : certs and Krb. Certs almost done (cdbop), beta testing to start : checks only the client cert (do we need server cert check ?). Currently no support for grid proxy (theoretically possible).
     250
     251Authorization issues : only 2 classes of templates (pro + others). Authorization currently based on class only
     252 * Solution : namespace + groups + ACLs
     253
     254Performance issue : one network connection for each operation. Plan to send, process and return a list of items over the same network connection.
     255 * Almost done, huge improvement for bulk modifications
     256
     257Namespaces : like a directory.
     258 * Allow homonymous templates
     259 * Easy to enforce access privileges per namespace
     260 * Implemented by pan loadpath
     261 * Mostly done, still in progress for cake
     262
     263Privilege enforcement through ACLs : semantic similar to Unix FS
     264 * <item>:<user/group>=<right>
     265 * Support for groups
     266 * Based on 2 plain text files on CDB server : editable with cdbop or a text editor
     267
     268Future directions :
     269 * Web service extensions : unify parsing/packing of I/O data, provide a WSDL description
     270 * Fine grained CDB locking with fair queuing
     271 * Parallel compilation of independent templates : how to extract dependency information in advance ?
     272 * Use mod_perl instead of CGI : could improve a lot response time at a price of lack of portability
     273
     274
     275=== SCDB - C. Loomis ===
     276
     277Design goals :
     278 * Multi-cluster mgmt with a single database
     279 * Hierarchical arrangement of templates
     280 * Treat configuration directly as code management : cdbop hides a lot of version management features
     281 * Usable offline
     282 * Must work in any environment and be secure
     283
     284Implementation :
     285 * Relies as much as possible on existing tools
     286 * Subversion : atomic commits, directory management
     287 * Ant : equivalent of make in Java, framework for including new tasks, method for executing simple workflows (task dependencies)
     288 * Eclipse (optional) : provides GUI interface, integrates with SVN and ant,  syntax coloring for pan
     289
     290Dependencies :
     291 * Apache 2 server
     292 * Subversion server/client
     293 * Java 1.5
     294 * Pan compiler
     295 * Ant
     296 * Optional : Eclipse with Subclipse, javasvn, colorer editor, SunShade plugins
     297
     298Status :
     299 * Latest release : https:/svn.lal.in2p3.fr/LCG/QWG/SCDB
     300 * Components : build files (300 LOC), quattor.jar (5 ant taks, 2k LOC), SNV hook script (200 LOC). Easy to maintain.
     301 * Ant task : compilation (PanSyntaxTask, PanCompileTask), deployment (SvnTagTask, NotifyClientTask), RPM repository management (RepositoryTask)
     302
     303Major changes planned  : none, works very well
     304 * Would like to remove need to log onto the server for AII tasks
     305
     306Minor changes :
     307 * Reorganization of templates
     308 * Integration of new pan compiler
     309 * New tasks : OS updates, patches, repository template upgrades
     310 * Integration of certs, VOMS
     311
     312
     313=== Discussion ===
     314
     315G. Cancio : CERN needs SOAP interface because some the tools used (Remedy…) can talk only SOAP.
     316
     317
     318== AII Overview and Future - R. Starink ==
     319
     320New maintainers since May 2005 : Cesar Lobo (UAM) and Ronald Starink (NIKHEF)
     321
     322Current stable release : 1.0.32 (August 2005)
     323 * Use new NCM template processor for setting partitions
     324 * Support for https
     325 * A few bug fixes
     326
     327Feb. 2006 : Jorge Izquierdo joinded, continuation of work C. Lobo on partitioning
     328
     329March 2006 : release 1.0.36 by M .Jouvin
     330 * LVM support
     331 * Improved package resolution
     332 * Replace rpmt by rpmt-py
     333
     3346 open bugs :
     335 * rpmt requires multiple invocation
     336 * aii-shellfe should not have log files
     337 * RPM upgrade should be optional : probably related to the times where RPM downgrade was needed. No longer the case.
     338 * Extending device naming schemas : more a pan template issue
     339 * Assumption on the profile name
     340
     341Taks and enhancements :
     342 * ncm-components for managing PXE/KS files
     343 * ad-hoc scripts in KS pre/post install
     344 * LVM support : done, one bug in template in 1.0.36
     345 * --rescue option : done
     346 * Fetch certificates using sindes
     347
     348Future plans :
     349 * Fix bugs
     350 * Compliance with SL4 and x86_64
     351 * Improve documentation : complete description of all options
     352 * Review NCM template processor : convenient but not easy to maintain. Cal : XSLT could be an alternative option to translate XML to something else.
     353
     354
     355
     356== Lemon - G. Cancio ==
     357
     358Lemon :
     359 * Monitoring agent running on each node sending data to the server. Information can be requested by the server with hearbeat between server and sensors or sensors can work in pure push mode.
     360 * Sensors to measure various metrics. Each sensor provides metric classes (parse log file, list process matching a criteria…)
     361 * Server : flat file or Oracle based
     362 * Display framework based on RRD/Web
     363
     364CERN Web pages at URL https://lemonweb.cern.ch/lemon-status (requires a NICE account).
     365
     366Can get perf data and other metrics like uptime, kernel version used, reboot time…
     367
     368Can link to CDB / XML profile of specific machine
     369
     370May define clusters and subclusters
     371 * Cluster definition can come from CDB
     372 * May aggregate resources by VOs (using group information)
     373
     374Can do metric comparaison between node and/or clusters
     375
     376Many sensors available, including database sensors (Oracle).
     377
     378Raw monitoring data are never aged out automatically. Must be requested explicitly.
     379 * Automatic table compression on Oracle
     380
     381May launch recovery actions on certain conditions
     382 * Implemented through an alarm sensor using exception definitions (exceptions are basically a kind of metric)
     383 * Can set a limit to the number of retry for recovery actions
     384 * Correlation engine (CMDaemon) but not currently used at CERN
     385
     386Working on an associated alarm management tool (GUI). Will implement alarm reduction
     387 * Horizontal : if several nodes report an alarm, report it once as a 'cluster' problem
     388 * vertical : mask some alarms in presence of others
     389 * Will use PySQL
     390
     391Quattor-Lemon integration :
     392 * CDB can hold definition of all sensors, metric classes and instances, exception handling…
     393 * 1 ncm component to manage lemon client
     394 * 1 ncm component to manage Oracle based server (no component for flat file server)
     395 * All the Lemon information under /system/monitoring
     396 * Predefined templates available in Quattor CVS but out of date. Need to get those from tarball
     397
     398Future developments :
     399 * Secure communication between Lemon components. Based on SSL. Done, under stress test. Both TCP and UDP.
     400 * Alarm system
     401 * Service Level Status System to give a user oriented view of services with a view of how appropriate is the service level (fully available, affected, degraded, non operational…). Under development.
     402 * Planning a "proxy" sensor for machines that cannot be monitored directly (e.g. Windows). The proxy sensor will return metrics based on some tests with the target machine.
     403
     404Sensors are common with gridIce but the config files are not (yet compatible). GridIce people need to upgrade to the new configuration scheme (directory based) : work in progress
     405 * There can be issues running both GridIce and Lemon on the same machine. Basically need to run 2 separate agents until the new GridIce version is available.
     406
     407
     408HW required for the server : don't need a very fast machine. No real scalability problem.
     409
     410Oracle backend can be used on Oracle Express version (free). But :
     411 * Oracle Express has no support for backup and partitioning (critical for performances)…
     412 * Oracle doesn't commit that Oracle Express will exist in the future…
     413
     414
     415== PAN Status and Developments - Cal ==
     416
     417Current release : 4.0.2
     418 * Full profile compression added (as an option) and thus removed element compression
     419 * Complete load path is combination of command line values and the 'loadpath' global variable in templates (specify relative directory).
     420 * Runs on Linux (rpm), Solaris (pkg), Mac OSX (dmg/pkg), Windows (no packaging, source only, must be built with Cygwin)
     421 * Pan specification up to date with current implementation
     422 * Still no user guide…
     423
     424Annoyances :
     425 * argv cannot be used as a function argument (not implemented as a list)
     426 * Iterators : not possible for global variable, cannot have multiple iterators on the same variable
     427 * Variable masking (silently) by first() and next()
     428 * Global/local variable collisions : cannot modifiy global variable from a function and cannot use the same name for global and local variables. Main workaround is name convention for variables (e.g. global variable names in uppercase).
     429 * Path specification : DML needed when a path element requires encoding ('/alpha' = {r=self;r[escape('a/b)'] = 2; return(r);};). Possible solutions through syntax extensions, e.g. loosening restrictions on path elements with a syntax like '/alpha[a/b]/gamma' = …
     430
     431Performances issues :
     432 * Memory usage : no garbage collection at profile level
     433 * Reference counting for temporary objects to implement some basic garbage collection for temporary object. Not efficient at all
     434 * Hand tuned string/memory management : very Linux 32-bit specific. Noticeable perf problem on 64-bit
     435 * Single threaded : prevents multi CPU usage.
     436 * Frequent (and expensive) object copies : any reference to tree and global variable results in a copy. Keeping derivation information (where the property/resource has been set) makes this worse.
     437
     438Possible solutions :
     439 * Memory : use standard memory management, use garbage collection library
     440 * Multi-threaded implementation is really needed but requires a deep re-engineering of the code
     441 * Immutable properties : no need to copy them if we remove derivation information
     442
     443Proposal :
     444 * Rewrite in Java : portable between platform, advanced garbage collection, easy integration of multi-threading, logging and regexp part of the language spec, serialization for object files, completed templates is trivial and should allow perf improvement (save intermediate object for reuse).
     445 * Speed issues : probably slower for a small number of profiles (jvm startup overhead) but speed up for dual CPU (expect 2x) and speed up from streamlined implementation
     446 * Status : Complete JavaCC pan grammar exists
     447 * Skeleton implementation but no complete prototype available yet
     448
     449Proposed language changes :
     450 * 'define' keyword complicates grammar in many places. Optional currently, removed in a future release to simplify grammar and speed up compilation
     451 * 'delete' keyword : conflicts with delete(), propose replacing with 'null' value that could be assigned to a path ('/alpha' = null).
     452 *  Automatic variables upper-cased (SELF, OBJECT, LOADPATH, ARGC, ARGV) : more consistent with convention, suppress conflict with 'object' keyword, migration period with both supported
     453
     454Functions : requests for new features
     455 * Character handling functions
     456 * Substitute() to allow variable replacement in strings
     457 * Push/npush : non obvious behaviour (need to return self). One possibility would be to add a 1st argument telling the push destination, could allow merge of push and npush.. Need to be moved inside the compiler for efficiency
     458 * merge() : error thrown for duplicate nlist keys. Can allow merge of duplicate keys if the value are identical or assure that there is no duplicate keys in the final list? Is there any unique solution or do we need a configurable behaviour. Could be moved to compiler for efficiency.
     459 * Bitwise operation support
     460
     461Other requests for enhancements :
     462 * Data template : will only allow assignmentsto compile-time constants
     463 * Single inclusion : allow a template to be included at most one time (similar behaviour as for declaration templates but for other types).
     464 * Variable includes : implementing if then else will complicate a lot pan grammar and will break non procedural nature of pan. Possible to implement 'include {dml}'.
     465 * Authorization : by template (limit what values can be set in a template), by user (limit what values a user can set)
     466 * Default values inside the record definition (declaration template)
     467
     468
     469Roadmap suggestion :
     470 * Freeze current C/C++ implementation
     471 * Work on Java re-implementation (6 months estimated)
     472 * Test both implementation against each other and choose the most sensible option
     473
     474=== Discussion ===
     475
     476Basic agreement on evaluation of Java re-implementation in a 6 months timescale.
     477 * Need to implement some of the most urgent RFE before freezing (include {dml}, single inclusion, default values in record definitions)
     478
     479G. Cancio : CERN has began to turn several templates included in each node (like sw packages) into an object template included in the node template through value("//external/template/path"). This has dramatic impact on performance (4x to 5x).
     480
     481R. Garcia : would like to get a profile schema definition into XML (XMLSchema ?).
     482
     483
     484== NCM Components - M. Jouvin ==
     485
     486Notes missing… See presentation.
     487
     488RPM less components :
     489 * Mixed feelings…
     490 * If implemented, both RPM and RPM less format should be provided for each components (modification of quattor build tools)
     491
     492Component configuration outside /software/components
     493 * Apart from convenience, main reason should be information that can be shared between several components
     494 * No consensus about a network component configuration location : should it register directly for changes where the information is (generally /system/interfaces, e.g. ncm-network) or transparently collect the information from where it is to a configuration area for the component (ncm-ifconfig). Many  pros and cons for both approaches. Using information directly from /system area requires everybody agrees on and implement a common schema : not currently true (e.g. CERN !).
     495
     496
     497== QWG Templates - Cal ==
     498
     499Current releases : LCG-2.7.0-4
     500
     501Overall design goals
     502 * Be as service oriented as possible
     503 * Be reasonably flexible to accommodate local site configuration
     504
     505QWG problems :
     506 * Complex structure : cost of flexibility, Grid services not "service oriented" (lots of dependencies between services), VO specific configuration
     507 * Backward compatibility : service themselves are often not backward compatible (e.g. complete rewrite of config information for Info System between 2.6 and 2.7), VO specific improvements, VO specific config
     508 * Poorly documented (undocumented) : difficult to start, example not always working/appropriate
     509
     510YAIM
     511 * Set of scripts maintained by the GD team at CERN
     512 * Quattor component exists, used by many (written/maintained by CERN) : basically does nothing except defining variables required by YAIM and kicking YAIM script.
     513
     514YAIM problems : not Quattor specific…
     515 * Lots of assumption on site config
     516 * Not very flexible
     517 * No effort for reproducible down/updates
     518 * Configuration fixes difficult (generally requires redeployment of RPMs)
     519 * Poor documentation (essentially undocumented)
     520
     521gLite Configuration :
     522 * Standard file format with complete examples
     523 * Configuration scripts with a consistent interface
     524 * Quattor interface : script to generate pan templates, 1 component for all gLite services
     525 * Problem : scripts often interfere with other components (accounts mgt…), problem with reproducible down/upgrades (same as YAIM), configuration fixes difficult (same as YAIM)
     526
     527gLite 3.0 : should be released soon on PPS (+ 1 ½ month to production)
     528 * Unclear they will go on with current gLite configuration mechanism
     529 * Could move to a mix of YAIM and current gLite
     530
     531Common issues :
     532 * Manpower to maintain QWG templates
     533 * VO configuration
     534 * Documentation
     535
     536
     537=== Discussion ===
     538
     539VO configuration :
     540 * 1st implementation : for each VO under /system/vo, a nlist with the simple name of the VO being the key.  For each VO, describes the specific 'services' (myproxy, rb…). Very difficult to have this information entered in a flexible way.
     541 * Current situation : added a bunch of function. Very fragile : doesn't work if there is not the right path on the left hand side.
     542 * New approach in QWG 2.7.0 : one global database VO_AUTH containing all the info for all the VOs and a few functions (mkgridmap…) processing the VO_AUTH information to produce the required information (gridmap…). More flexible as VO_AUTH has no schema information : information can be structured in the way the more appropriate for its final use (component schema). If agreed, could completely remove /system/vo. VO_AUTH created from structure template (1/VO). This structure template could be generated by a script from a central database.
     543 * Future  : should move VO services location out of VO declaration template
     544
     545gLite :
     546 * Need to feed developer on what can make our life easier… or worst. Important if we can have a common standing from all Quattor sites.
     547
     548LCG template maintenance :
     549 * Cal : could probably move for one person having the responsibility for all components to several people in charge of different services, as services are more or less independent.
     550 * S. Childs propose to participate to beta testing of QWG templates
     551 * QWG template mainly used by GRIF, Ireland, VULB, DESY (partly). Should concentrate on producing documentation and polishing templates so that a site can import the template tree for a MW version and have it running without any customization other than a few site dependant profile.
     552
     553
     554== Namespaces : For what ? Which ones ? (discussion) ==
     555
     556Cal : some base namespaces could be : os, pan, lemon
     557 * Not really useful for LCG templates (can be just one namespace)
     558 * For OS, os/version-arch ?
     559 * Namespace should reflect responsibilities for component maintenance
     560
     561Components :
     562 * Proposal : component/<component>/defaults (equivalent to current pro_software_component_<component>).
     563
     564OS : os/pkg/<pkg-definition> (e.g. base.tpl)
     565
     566Pan standard declarations/functions : pan/*
     567
     568Lemon : one specific namespace , lemon/*
     569
     570Hardware : hardware/<cpu,ram,nic,machine…>/
     571
     572Middleware : grid/<lcg|glite>/<service|config|machine>
     573
     574Quattor : quattor conventions (schema…) : quattor/
     575
     576'''Remark : namespace relative to load path.'''
     577
     578== TOMAS3 - R. Garcia Leiva ==
     579
     580http://tomas.andago.com
     581
     582Andago : company targeted at providing solutions for Linux based on open source technologies.
     583
     584TOMAS3 : Toward an Open Management Architecture for Systems, Software and Services
     585 * An open architecture where you can plug your existing products
     586 * A production qualitiy product allowing to manage your own resources
     587 * 2 pilot customers : Eroski (supermarkets), Univ Autonoma de Madrid
     588
     589TOMAS3 based is made of several components :
     590 * Deployment and configuration management : Quattor
     591 * Service Management
     592 * HW management
     593 * Monitoring based on Nagios as rather than Lemon (better alarm management currently). Lemon could be used too.
     594 * Fault tolerance
     595 * Integration for acceptance by enterprise (documentation…)
     596 * Communication between each component through Web services allowing to replace any component
     597 * Compliance with management standards
     598
     599Roadmap : try to use short cycles to get early feedback (6-12 months)
     600 * 1st release : end of May
     601 * 2nd release : end of 2006. Cycle open to the community.
     602 * Will try to follow the schedule even at the price of delaying some functionalities as feedback is crucial.
     603
     604Using CIM model in Quattor : currently no clear model for information managed by Quattor.
     605 * CIM is a standard model, implementation neutral, object oriented description of configuration
     606 * Advantage : industry driven standard will help acceptance by enterprises, should help future integration with web services.
     607 * Problems : very complex model, not everything so well defined, object oriented approach difficult to implement with pan, migration will require a rewrite of all components
     608
     609
     610German :
     611 * looked at CIM schema during the early times of Quattor but appeared unnecessarily complex. At the end Quattor would use only 10% of the schema.
     612 * Other open source solution like ROCKS, LCFG have also their own schema, in fact much more basic than Quattor schema.
     613
     614
     615Using WSDM (Web Service Distributed Management) in Quattor :
     616 * Open standard supported by industry
     617 * WSDM specify how to manage distributed resources (even resources out of the scope of Quattor : printers…)
     618 * Both Management Using Web Services (MUWS) and Management of Web Services (MOWS)
     619 * Based on a producer/consumer model, support for discovery through web service endpoint
     620 * New devices to come with support of WSDM
     621 * Integration of devices without native WSDM can be done by a WSDL proxy
     622 * Problems : doesn't fit very well with Quattor architecture, MUSE (open source server) doesn't provide any client, no open source SOAP server with support for SW addressing.
     623
     624Cal :
     625 * Don't expect any grid machine to have a WSDM interface. WSDL proxy is the only viable solution
     626 * Configuration changes require support of a basic transaction module to be able to process a set of changes as one operation to avoid "intermediate states".
     627
     628
     629== Quattor Future Directions and Roadmap - Discussion ==
     630
     631Agreement to have another meeting in 6 months (early October ?).
     632 * 2 days over 3 days ?
     633 * Could imagine a session with // working groups of interest (e.g. SCDB, Lemon…)
     634 * Fee free meeting is a good approach (and let everybody take care of the dinner…)
     635
     636What about a tutorial on Quattor during T2 workshop
     637 * Need to agree on one Quattor implementation
     638 * Need to focus on T2 issues, in particular distributed T2
     639 * May be no a real tutorial, more demonstrations and use cases
     640
     641=== Quattor Releases ===
     642
     643A Quattor release should be only the core components, not the templates (even the core templates) : CDB, SCDB, AII, SWrep.
     644 * Templates must be released separately and namespace issue must be handled in template, not core components : have a separate set of templates for standard templates (quattor/pan), core, lcg.
     645 * Ncm-components : should provide some kind of package of production version for components
     646 * Need one wiki page documentating what's the production version for every subset
     647
     648=== CDB ===
     649 * CERN will complete current developments and release them to the community (namespaces, ACLs…). Timescale : ACLs in March, namespaces in April/May.
     650
     651=== QWG templates ===
     652
     653 * Should tighten the links between the 3 sites using them
     654 * Trac site created and will be advertised to the list
     655 * Try to use these templates as a testbed for implementing namespaces as discussed. Need to guarantee that build tools are able to generate both a version with and without namespace during the transition period. Start with pan, quattor and os templates
     656 * Try to implement German suggestion of using object templates for package list (at least on worker nodes) and may be VO information.
     657 * Beta testing of service by GridIreland and VULB (in particular for glite).
     658
     659=== Lemon templates ===
     660 
     661 * Last version to be provided next week
     662 * Provide some feedback on integration of last configuration scheme for sensors into GridIce
     663
     664=== AII ===
     665 * Have quickly a new release with main bug fixes (need to fix kickstart template) and addition of -rescue
     666
     667
     668=== PAN compiler ===
     669 * Implement variable includes, single includes and default values in declaration templates
     670 * Freeze and reimplement in Java for comparaison and evaluation
     671
     672=== NCM components ===
     673 * Identify CERN components that should be made generic and move them to core (e.g afsclt)
     674 * Build an index of existing components from CVS with a script and update information on CVS (README files) if needed.
     675 * How to build a release of components ? Who can be a release maintainer ?
     676
     677
     678=== Savanah ===
     679 * Our main bug tracking, want to disable task list
     680 * Workflow should be clarified : try to do it in coordination with other group. Ongoing discusson at CERN. Conclusion will be sent to the mailing list.
     681 * No consensus on entering something into Savanah for every change (at least tag) to CVS. Another option is to improve quattor build tools.
     682
     683 
     684
     685
     686