Changes between Initial Version and Version 1 of Meetings/Workshops/20090311


Ignore:
Timestamp:
Mar 14, 2009, 12:50:03 PM (17 years ago)
Author:
/C=FR/O=CNRS/OU=UMR8607/CN=Michel Jouvin/emailAddress=jouvin@…
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Meetings/Workshops/20090311

    v1 v1  
     1= Quattor Workshop - London - 11-13/3/09 =
     2[[TracNav]]
     3
     4[[TOC(inline)]]
     5
     6See [http://indico.cern.ch/conferenceTimeTable.py?confId=50010&showDate=all&showSession=all&detailLevel=contribution&viewMode=parallel Agenda].
     7
     8== Site Reports ==
     9
     10=== LAL - G. Philippon ===
     11
     12=== Grid Ireland - S. Childs ===
     13
     14Not much changes in the number of admins or resources
     15 * Virtualization of most nodes (Xen)
     16 
     17Developments in monitoring:
     18 * NAgios + Ganglia
     19 * MonAMI to feed Ganglia from DPM and Torque
     20 * LEMON : upgraded to lemon-web on SL5
     21
     22Many tools upgraded : checkdeps, quatview
     23 * ncm-accounts massive speedup
     24
     25SCDB : merge "hierarchical site" model into trunk
     26
     27Issues:
     28 * Get network and file sytems fully under Quattor control
     29 * Consistent scheme for monitoring in Quattor
     30 * Dummy WN speedup trick integrated into the compiler
     31 
     32
     33=== NIKHEF - R. Starink ===
     34
     354 clusters: 300 machines
     36 * Currently 5 people involved with Quattor
     37
     38SCDB + local changes: deployment not done by SCDB bu by local tools to allow deployment of a specific machine
     39 * Related to historical way of managing systems at NIKHEF but has the disadventage that a postponed change deployment may break something later on some other nodes...
     40
     41Xen: 7 hosts, 38 guests
     42 * Based on QWG but issues with host and guests in different clusters: workaround found
     43
     44Monitoring entirely based on Nagios with 1 master and 3 slaves
     45 * Based on QWG but some mods to handle hierarchy of servers that are willing to share
     46
     47panc v7 to v8 transition: no problem but no performance improvement observed
     48 * Very happy of new logging features
     49 
     50NCM components:
     51 * ncm-openvpn to configure server and clients
     52 * ncm-yaim: complete refactoring/rewriting, new features, some backward incompatible changes
     53 
     54Issues:
     55 * WN compilation speed-up has some pb with compile-time dependency
     56 * Strength of community with increase usage: what about the ability to support everybody?
     57 
     58=== LAPP - E. Fede ===
     59
     60Quattor server is running on a VMware virtual machine
     61 * 110 profiles
     62 * 4 people using it
     63
     64Running autobuild of RPMs for NCM components and other core components from SourceForge
     65 * http://lapp.in2p3.fr/Quattor
     66 * trunk and tags/latest
     67 * repodata available from YUM
     68
     69=== CERN - V. Lefébure ===
     70
     71Main instance: 7500 profiles in 139 clusters
     72 * +1200 increase
     73 * 1900 profiles corresponding to machines not managed by Quattor
     74 * Running the last version of CDB, panc v8 ready
     75 * v8: 20% improvement in compile time but not yet in production. all know issues solved
     76   * Problem of use of RECORD type by some components (ncm-httpd, ncm-tomcat....)
     77
     78Xen-based virtualization: support for SLC5 hypervisors ready
     79
     80Issues of number of users: 65 ACL groups
     81
     82Package list templates: working on automation
     83 * Use of comps.xml
     84 * Automatic detection of missing dependencies
     85
     86CDB2SQL: Python version fast but buggy, no manpower to fix it, reverted to previous version
     87
     88
     89=== Morgan & Stanley - N. Williams ===
     90
     91In production now: AQB, AQDB, LEMON
     92 * 7500 nodes, compile at 10 minutes (8-core machine) but aii-shellfe --notify at 1h (with patches not yet committed) !
     93 * 5 template admins
     94 * New building just commissionned and expect to double the number of machine in the next months, plan to keep one server (+1 for redundancy)
     95
     96Issues:
     97 * Format change of XML profiles painful: dropping LINK support forces "big-bang" changes
     98 * Configuration success feedback: thinking at implementing a DB of last time a component was run, updated by ncm-ncd/ncm-cidspd that could be compared with the timestamp of the last configuration
     99
     100Will submit code now via SourceForge
     101 * Waiting for approval to open source : AQDB, FUSE interface to configuration browsing, AII, CCM patches
     102
     103
     104=== UAM - Laura del Caño ===
     105
     106Luis left, Laura is his replacement.
     107
     108Proposal of tasks that UAM could handle:
     109 * Maintenance of monitoring tools
     110 * openvz support
     111 * AII
     112 
     1135 clusters
     114 * Use of ant local tasks for template management
     115 * Performance tests of new machines configured with Quattor
     116 
     117New component in progress:
     118 * ncm-amanda to configure Amanda backup SW
     119 * ncm-pnp4nagios
     120 
     121Some local developments:
     122 * Postgresql DB to store machine info and group them into categories + ant local task to generate the profile and some other templates (monitoring) for the machine
     123 * SinDes alternative used to manage secured access to profile (AII hook)
     124 
     125=== Greek Grid - D. Zilaskos ===
     126
     1271 Quattor server to manage 2 clusters representing 133 machines spanning 13 subnets
     128 * 4 Xen hosts, 19 guests
     129 * SVN server installed with Trac
     130 * 3 admins + 2 new people who recently joined
     131
     132Developpements and issues:
     133 * Wiki guides for Quattor newbies
     134 * New components: still in progress... some services like Hydra evolving very quickly
     135 * Involvement in OAT benefits as work is implemented and tested locally with Quattor
     136 * Thinking at an administration model for Southern Europe based on GRIF/GRID Ireland experience
     137   * Lot of small sites, very limited effort available...
     138   
     139=== CNAF - A. Chierici ===
     140
     14190% of the templates adapted to the new schema
     142 * Inspired by QWG
     143 * Next step is migration of gLite nodes to SLC5
     144 
     145Xen used on 3 servers providing 16 cores each
     146 * LHCb T2 running on Xen
     147 * Quattor used to configure the profiles of guests but Dom0 managed by hand
     148 * Investigating KVM
     149 
     150Planning to install soon new ncm-yaim and ncm-accounts
     151
     152CDB vs. SCDB: still thinking about migration but need to investigate the impact for the users
     153 * Is CDB still supported after ME left ?
     154 * Who will take care of new core release ?
     155
     156Presenting a poster about Quattor at CHEP: anyone who could help to present it ?
     157
     158One new people to help with Quattor at CNAF (Elisabetha)
     159 * All new sysadmins teached to use Quattor
     160 
     161
     162== Aquilion Drive Through - W. Hertlein ==
     163
     164Aquilon is an architecture to address system management at M&S, in particular scalability, management delegation, application-centric management...
     165 * Goal is to install hundred of machines without any manual intervention
     166
     167AQDB is the CDB replacement: no direct interaction between users and templates, everything goes through the Aquilon broker (AQB)
     168 * Aquilon configuration stored into AQDB
     169 
     170Workflow of provisionning machines:
     171 * Rack is the unit of work: racks delivered cabled
     172   * Limited number of vendors and models
     173 * When racks is powered on, top of rack (tor_switch) switch send a DHCP request and receive a temporary address that will allow to configure it after discoverin its type
     174   * Switch entered in AQDB
     175 * DNS integration : all servers configured with the same information
     176   * DNS DB built from a periodic dump of AQDB (every 3h)
     177 * DHCP integration: a DHCP server set to server a specific set of machines
     178   * Configuration built from a periodic dump of AQDB
     179 * Discovering machine: done by scripts scanning the tor_switch an dquerying it with snmp get
     180  * Create a machine entry in AQDB for each discovered MAC entry
     181 * After a machine has been entered in AQDB, create a machine plenary (PAN) template describing the HW
     182  * Machine plenary template is an equivalent of SCDB hardware/machine + the service/personnality
     183 * Creating hosts: logical entry for the host created and associated with a plenary template
     184  * IP is derived by choosing an available IP from the tor_switch subnet and put in the plenary template to avoid an IP address change triggers recompilation of another host
     185 * Wait for the propagation of previous information, in particular DNS: may take a couple of hours
     186 * Next day build out the hosts binding host to required services and use panc to compile the template
     187  * Invoke aii-shellfe to switch over the install pxe image (done through AQB)
     188  * DNS + Krb propagation delay: looking for some optimizations in the future
     189 * Finishing the build process: a script runs on the newly built host to update its information in AQDB
     190   * Record any management interfaces found and if successful update the status information
     191 * Grid hand-off: after successful build of a host, it is transferred to an application groups
     192   * Deployment schedules are set up in advance
     193   * Use personality to set up a host with application defaults
     194   * Spread hosts for a group over several racks for better resilience
     195   
     196Managing services: a service may have several instance
     197 * 1 service instance is bind to a host
     198 
     199Monitoring hosts to meet SLA
     200 * Hosts are monitored by a daemon that does periodic snmp sweep of all the hosts
     201 * Hosts are grouped by personality with a threshold of hosts allowed to be offline
     202 * Host personality also defines a reboot schedule: potential previous threshold violation will prevent the reboot
     203   * Include some draining capabilities
     204 * LEMON is enabled as a service to provide visualization of aggregated metrics
     205  * LEMON configured from Aquilon to guarantee consistency
     206
     207Updating personalities involve working on a local copy of the personality and creating a new temporary one (AQ domain), compiling it with a fake profile and putting the change back into AQDB
     208 * Currently no check enforced by 'aq put' that the new version compiles. Rely on conventions...
     209 * A test host can be associated with the new personality and reconfigured to validate the changes
     210 * AQ then allows to merge changes into a production personality and reconfigure all nodes using it
     211 
     212 
     213== Monitoring ==
     214
     215=== Monitoring Templates in QWG - S. Kenny ===
     216
     217LEMON:
     218 * NCM components: fmonagent, oramonserver
     219 * Templates: standard/monitoring/lemon
     220 * Web front-end via filecopy
     221
     222Nagios:
     223 * NCM components: nagios, ncg
     224 * Templates: standard/monitoring/nagios
     225 * A few changes to support Nagios 3
     226 * Hosts created from HW DB
     227 * Services defied as separate templates and added to NAGIOS_SERVICE_TEMPLATES variable but not very scalable
     228 * ncm-ncg currently being developped to produce an input file for WLCG NCG which generates the service definition: looks promising
     229
     230Ganglia: configured with filecopy
     231 * Would be better to have a component generating the required config file on client and server from the site hierarchi description
     232
     233MonAMI: configured with filecopy
     234
     235Currently, every monitoring tool has its own configuration. Ideally monitoring schema should be mostly tool-independant. Proposed model based on current LEMON config:
     236 * Host: coming from DB_MACHINE
     237 * Cluster: group of nodes sharing the same node types
     238 * Super-cluster:
     239 * Need to be part of the information in the node profile so that it can be used by several components
     240 * May be connected with some representation of M&S personalities into the schema
     241
     242
     243== Core Components ==
     244
     245=== PAN Compiler - C. Loomis ===
     246
     247Code hosted on SourceForge since 8.2.6
     248 * Bug tracking moved to SF too
     249 
     250Production version is v8
     251 * v7 deprecated
     252 * 8.2.3 : CDB compatibility, annotation
     253 * 8.2.4 : selective debugging
     254 * 8.2.7 introduces prepend/append functions to replace push/npush
     255 
     256Outstanding bugs:
     257 * Race condition in validation
     258 * Corner cases with unintuitive behaviour
     259 * Enforce final flag in structure templates: no real request...
     260 
     261Enhancement requests (in priority order):
     262 1. Restricted include (aka entitlements)
     263 1. Add perf tips to documentation
     264 1. XInclude directive to replace Embedded
     265 1. Add OBJECT to debug() and error() output
     266 1. Add prefix/define statements to shorten literal paths
     267 1. Internationalize error messages
     268 1. Enable/disable debugging from within pan
     269 1. Allow include to take a list (from M&S)
     270
     271 
     272Other ideas/wishes:
     273 * Better Eclipse integration: editor, debugger, dialog boxes to select options for ant/pan
     274 * Ability to include a file which is not a template inside a template as an alternative to `<<EOF`
     275   * File name should be relative to the loadpath as for other templates
     276 * Rework string escaping handling for automatic escaping by the compiler and a prefix in the string to know if it has been escaped or not without ambiguity
     277   
     278Restricted include statements: design questions
     279 * Allow the scope of configuration tree changes to be specified when including a template ?
     280   * - : require modification of templates to do entitlement
     281 * Specified allowed areas of tree ? or restricted (disallowed) areas ?
     282   * Need both, discusse if exclude before include or the opposite
     283 * Can '/a' be fixed but '/a/b/c' be changed ?
     284 * Enforce entitlement on loadpaths ?
     285 * What about variables ?
     286 * Allow wildcards and/or regular expressions on the template names ?
     287 
     288
     289=== NCM Components - N. Williams ===
     290
     291ncm-cron: more smearing
     292 * `frequency` may be replaced by `timing`, a nlist allowing to specify a targert (eg. interval into hours) + a smear interval
     293 * hours/days/month/... are strings that allow expressions and are validated
     294 * If present `timing` takes precedence over `frequency`
     295
     296ncm-download: retrieve URL
     297 * funtionality similar to filecopy but contents download with curl form a URL
     298 * Support proxies
     299 * Can use `spnego` support into curl for authenticated access to URL, using Krb principal for example
     300 * Support global definition of http server to use and relative URLs
     301 * Subclassable: a method allow to tell the component the configuration path to use, instead of the default, allowing the component to be called by another component for one specific file
     302   * An idea to reuse in other components like ncm-filecopy
     303   * A problem with ncm-ncd which might not match exception
     304
     305CCM: support for local CCM DB format
     306 * Specified with `dbformat`: can be `GDBM_File`, `DB_File`, `CDB_File` (compact DB, nothing to do with Quattor CDB!)
     307 * Format is stored in file `.fmt` close to `.db` file: only the writers need to take care of it, read is properly handled by CCM library
     308 * Download ''if-modified-since'' local disk version, ignoring the mtime in CDP notification message
     309 * ccm-fetch now uses the library coming from ccm-fetch.new (3 years old!)
     310
     311ncm-cdispd/ncd: would like to add configuration status feedback
     312 * `/var/lib/ncm` contains an entry per component:
     313   * If the file is empty, component must run
     314   * If run successfully, delete the files
     315   * If failed, write the exception into the file
     316 * If a component is inactive, ncm-cdispd remove the file for the component in `var/lib/ncm` unconditionnaly (without checking its contents)
     317
     318spma/aii-ks: randomization of proxy choosen when multiple available
     319 * If the AII server processing the kickstart file is in the list of proxies, use it
     320 * Support for a list of "installack" servers, requested in // to ensure consistency when several AII servers are used in //
     321 * Require ncm-nscd start before starting configuration to ensure name service can be restarted without impact on the configuration process
     322
     323AII: --notify, --firmware, use_fqdn
     324 * `--notify`: automatically configures anything thant needs configurating, based on `profiles-info.xml`
     325 * `--firmware`sets alternate pxe boot target to boot relevant firmware installer based on HW information from the profile
     326   * Schema change required
     327   * Reset done by another external tool
     328 * Global lock no longer requested for aii-shellfe, only for aii-dhcp.
     329   * Only a lock on the specific client host
     330 * Fixed caching, created multi-level cache hierarchy where first level is the DNS domain name
     331
     332
     333
     334== QWG ==
     335
     336
     337== SourceForge Migration - S. Child ==
     338
     339Main source repository now on SF as a snapshot (without history)
     340 * 22 developpers registered: many active developpers still missing
     341   * Send your SF account to Stephen
     342 * Build tools basically working but unmaintanable now that ME left
     343
     344quattor.org now is a wiki (MediaWiki)
     345 * An account is required to edit pages
     346 * Feel free to add new pages for experimental stuff without linking it to other pages
     347
     348Issue tracker used only by panc so far.
     349 * Would be better if we could use the Trac one as it seems to be available on SF
     350
     351Download menu: mainly pan currently, difficult to automate updates
     352
     353Documentation menu is unused as this is only plain web pages
     354
     355Mailing lists still hosted at CERN: next candidate for migration
     356 * Create new lists with current subcribers
     357 * Block submission on mailing lists at CERN
     358
     359QWG code and documentation still hosted at LAL
     360 * Trac now available on SF, test it as it would be much easier
     361 * Check if Trac DB can be reimported and how to migrate SVN (snapshot would be acceptable).
     362
     363User branches: no problem, create them in branches, currently unused.
     364
     365Announcing new tags on the mailing list: no consensus, not sure it will be manageable
     366 * Would be preferable to have a dashboard, let's discuss it as part of QBT future
     367
     368 
     369== Quattor Datawarehousing - N. Williams ==
     370
     371Quattor and databases:
     372 * Managing relations between objects and metadata having integrity constraints should involve a relational model and generally connected to enterprise
     373 * Managing configuration behaviours should be the task of an appropriate language like PAN
     374 * What is configured where ? require a DB  like CDB2SQL
     375   * Not only display of existing configuration with different views but also historical views
     376
     377QuatView: uploads selected information from profiles into MySQL, displays with a web browser
     378 * Big limitation : single timestamp
     379 * Currently only MySQL but should be easy to use other back-ends
     380 * Recent additions/enhancements to have the web client much more flexible and usable, eg. search on every column
     381
     382CDB2SQL: similar features to QuatView but without the web tool to browse the DB
     383 * Use DBI API
     384 * Use DB bulk insert
     385 * Implemented as a server module that can run anywhere
     386 * Detect changed profiles
     387 * `-ora` version is in fact not Oracle specific (MySQL is still the default) but has a lot of Oracle specific additions (mainly views)
     388 * `-dist` version more recent and faster but currently Orcale specific, although not hard to change
     389   * cdb2sql just schedule uploads of XML that are done by a multi-threaded process
     390
     391Server modules:
     392 * CDB and SCDB send CDP notification that profiles have changed
     393 * SCDB uses a CCM CDP notification sent to the host affected
     394 * CDB send a message to a server in charge of taking appropriate actions for the affected clients
     395   * Involves calling cdb2sql-sync
     396 
     397'''Proposal''': use cdb2sql with Quatview web interface
     398 * Extend schema to provide versioned data : from underlying SCM ? from XML ?
     399   * Build into the profile by panc ?
     400 * Include revision tag in data
     401 * Publish datawarehouse to AII instead of notifying datawarehouse and AII in //
     402
     403
     404== Quattor and Inventory Management ==
     405
     406=== CERN - V. Lefébure ===
     407
     408CDB2SQL to retrieve information from Quattor CDB. All other applications using DB produced by CDB2SQL
     409
     410CERN specific schema extension to describe the HW.
     411 * Include the template name describing the HW
     412 * Track warranty contract ID
     413 * Track power consumption
     414 * Also used for machines not managed by Quattor to keep track of them in the inventory
     415
     416CC Tracker: displays a geographical view  of the machines managed by Quattor
     417
     418HMW: application to handle installation, move, rename, retire... of a machine
     419 * Form-based application with lot of parameters pre-filled with the spread sheet delivered with the machines
     420 * Interconnect with other DB/apps involved: Remedy, LAN DB
     421 
     422Request for new HW done by users using a Web form
     423
     424
     425=== Morgan & Stanley Asset Management - S. d'Aquila ===
     426
     427''Note: M&S interested to hear feedback on how this work may be useful for the community.''
     428
     429Aurora was relying heavily on AFS
     430 * Configuration lives into hand-crafted files, prone to human error, impossible to browse in a reasonnable amount of time
     431 * Poor data quality of data about hosts
     432 * No fine-grained entitlement
     433
     434Aquilon design goals:
     435 * Model systems by what make them similar rather than different
     436 * Manage resource at a higher level with no manual intervention and application-centric view
     437 * Avoid NIH syndrom
     438 * Vendor neutral with the option to open source
     439 * Build a system it is easy to interact with
     440 * Mix of declarative configuration (configure this machine as a mail server) and proscriptive configuration wher the entire configuration is under the control of a configuration management tool.
     441 * Ensure consistency between server and clients configuration
     442
     443Why use a DB:
     444 * Fast efficient retrieval of data with referential integrity
     445 * Concurrency control, transaction, fine-grain locking for free
     446 * Be a source of information for legacy systems
     447
     448Technology choices:
     449 * Twisted Python: event driven network application framework
     450 * SqlAlchemy: object relational mapper and SQL toolkit
     451
     452Assets managed:
     453 * Hardware: machines and their components
     454 * Real estate they occupy
     455 * Namespaces: DNS domains, resetved tcp/udp, users, groups
     456 * Services
     457
     458Lot of effort put in architecture definition and taxonomy:
     459 * Personality: what is machine role ?
     460 * Archetype: philosophical basis behind the build process
     461   * Examples: Aquilon, Aurora, Aegis, Windows
     462   * Future: clusters of VMware, network gear, SAN/NAS devices...
     463 * Location:a flexible way of hierarchically organizing our stuff (regions, campus, building...)
     464   * Configuration assumption: it is usually better to use a service instance local rather than remote or at least as closest as possible
     465
     466A system configuration is made of:
     467 * Hostname (and associated interfaces)
     468 * Archetype and its requirements
     469 * Personality
     470 * Services it uses structured as an ordered list of templates
     471   * For each service, define instances: a host providing the service, its location and the template associated
     472   * Service mapping responsible for automatic selection of instance based on requirements defined somewhere else like archetype, personality
     473
     474Future directions:
     475 * Same kind of requirements for archetypes to personalities
     476 * Advanced entitlement and audit capabilities
     477
     478
     479== Quattor in Amazon Cloud - C. Loomis ==
     480
     481AWS vs. Xen:
     482 * Network ocnfiguration: all machines have private and public IP addresses but users cannot predict or allocate them before starting the machine.
     483   * Network interface uses the private address for configuration
     484   * DNS contains only public address, not the private one
     485   * IP address can be changed on the fly when using Elastic IP
     486   * hostname command doesn't return the DNS name associated with the public IP address
     487 * Installation: PXE not supported for installation of the machine, must start from an existing machine image
     488 * Must use limited list of supported kernels: only RHEL5 kernels
     489
     490Current config:
     491 * Quattor server in the cloud: quattor.stratuslab.org
     492   * Only packages and profiles, no AII. Only httpd
     493 * SVN at SixSq
     494 * Base VM image used already has the basic quattor client installed + a script run during first boot as part of init.d to do the first ccm-fetch and ncm-ncd
     495
     496Issues and questions:
     497 * Multiple machines can use the same profile: easy and clean way to define only one WN per site.
     498 * Machine names not known at compile time : how to link batch server and clients, nfs server and clients ? How to handle late binding ?
     499 * Change notifications fail : no link between profile name and machine name
     500   * Allow machines to register for changes ?
     501   * Move to "chat room" (Jabber?) messaging for changes ?
     502 * Workflow: how to manage image disks, IP addresses, machine lifecycle ?
     503 * Should Quattor manage only image instead of machines ?
     504
     505
     506== Virtualization Update - S. Child ==
     507
     508Xen:
     509 * ncm-xen pretty stable but needs some work
     510   * Should delete managed configuration files when removed from profile
     511   * Filesystems code should be removed and use ncm-filesystmes
     512 * QWG helper code: "database" mapping guests to hosts, xen/configure_guests() function
     513   * configure_guests() populates `/software/components/xen/domains` from guest templates, automatically set `/hardware/location` to the DOM0 host name
     514 * guests and host in different cluster: a SCDB workaround implemented by adding the guest clusters in `cluster.pan.includes`
     515
     516Current virtualization deployment:
     517 * TCD and NIKHEF: Xen + QWG
     518   * NIKHEF has some local tricks for cross-cluster HW handling
     519 * CERN: Xen + enclosures
     520   * Enclosures are close to `XEN_DB` : a nlist of parents with their children as the value
     521 * M&S: VMware + XML injection from config into hypervisor
     522   * No way to manage VMware server with Quattor (black box)
     523
     524openvz: used by UAM, currently in testing, configured with Quattor
     525
     526VM migration: only proposal from Luis, on http://quattor.org
     527 * Is it within the scope of Quattor or should it be integrated in some new lifecycle workflow manager ?
     528 * Need to find concrete use case for VM migration ?
     529
     530Enclosures: generic model for host/guests dependencies
     531 * VM, blades...
     532 * Move to it ?
     533
     534
     535== European FP7 Quattor Project - C. Loomis ==
     536
     537Goal: get additional manpower to make some significant changes to Quattor toolkit and do the documentation
     538 * Currently a strong community with several developpers
     539 * But no dedicated people: mainly fixing urgent problems
     540
     541FP7 Infastructure calls in Sept. 09 related to EGI
     542 * One of them related to middleware, including tools and services for deployment
     543
     544Institutional requirements:
     545 * Need to identify a lead member, CNRS could act
     546 * Must involve typically, at least 3-5 different countries
     547 * Partners must be legal bodies, including JRU
     548 * Both academic and commercial partners
     549
     550Project requirements:
     551 * Scale of funding must fit in the call guidelines.
     552 * Usually significant matching effort required: european funding only covers 50-80% of the project cost
     553   * Matching resources may be people and manpower
     554 * Must have JRA, NA and SA componenents
     555 * Must include dissemination and sustainability plans: should fit easily with the SF move
     556
     557Project outline proposed:
     558 * NA: management, quality control, dissemination/training
     559   * Pre-packaged appliance to ease starting with Quattor
     560 * SA: build and test infrastructure, release management
     561   * Focus on Quattor use for gLite, QWG may be a good link with production infrastructure
     562 * JRA: improvement of current tools, control of cloud and virtualized resources, full lifecycle management, IPv6 support
     563   * EU has a strong push on IPv6
     564   * For current tools, focus on move to API allowing language independance
     565 * Must take into account that SA activities are better funded than others
     566
     567Concrete tasks for building the project:
     568 * Need to identify partners and the project activities they are interested in: must cover all major parts of the Quattor toolkit
     569 * Must define the roadmap for the new features we want
     570 * Determine plans for dissemination and sustainability
     571
     572Timeline for preparing the project:
     573 * June: identify the partners and lead partner
     574 * Detailed outline of project by Sept.
     575 * Finalize project description and financial aspects by Dec.
     576
     577
     578== Improving Quattor's Accessibility - C. Loomis ==
     579
     580Quattor jas a steep and difficult learning curve
     581 * Both for users and developpers
     582 * Some is linked to the comprehensive nature of the toolkit but not all
     583 * Some reasons include:
     584   * Inadequate documentation, despite the improvements
     585   * Treating Quattor as an all or nothing affairs
     586   * Leftovers from old projects
     587
     588Inconsistent branding: `Quattor` name is very seldom seen in the individual tools
     589 * Very difficult for new users/admins to identify Quattor parts: daemons, config file names...
     590   * Move all config files to `/etc/quattor`
     591 * Review acronyms and change those which are not useful
     592 * Need to decide what we want to change and what is the roadmap for a non disruptive change
     593
     594Documentation:
     595 * Lack of a good overview document
     596 * Incomplete and often outdated
     597 * Difficult to relate to a particular release tool
     598 * Automate documentation as much as possible
     599   * For example, use annotation in templates to document ''public'' variables
     600
     601Tutorials:
     602 * Very important but material often outdated
     603 * Generally focused on managing a whole site with Quattor
     604 * Need some tutorials on how to incrementally add Quattor to an existing management infrastructure
     605 * Explore Amazon AWS as a way to provide testing resources to people who want to evaluate Quattor
     606
     607Multiple APIs/Tools
     608 * Better document strength and weakness of similar tools (eg. SCDB/CDB)
     609 * Have a good justification for multiple tools
     610 * Need a way to mark deprecated components and APIs
     611
     612Coding practices: have a set a coding practices (at least for Perl-based components) with a good agreement but nearly completely unenforced
     613
     614Missing functionalities
     615 * PAN: Eclipse-based editor and debugger
     616 * Components: lots of duplicated code for skeleton of component, cannot run component as standalone script
     617 * Hooks for monitoring Quattor status
     618 * Messaging infrastructure inside Quattor
     619
     620MEthodology:
     621 * Agree on a list of problems
     622 * Evaluate te impact of fixing each of the problems
     623 * Implement them along with other developments
     624 * Automate as much as possible: prevent regressing problems that have been fixed
     625
     626
     627== Quattor SW Process - C. Loomis ==
     628
     629Current situation:
     630 * Quattor build tools mostly standardized with some adhoc checks for conventions
     631 * Nightly build performed from trunk
     632 * No dashboard of current state of code base
     633 * Few automated tests of Quattor tools
     634 * Little gathering of documentation: script exists but need to be integrated into SF
     635 * No quality checks of the Perl code
     636   * panc is using Java findbugs, similar things exists for Perl
     637
     638Continous build: suggest moving to system like [https://hudson.dev.java.net Hudson]
     639 * Language agnostic
     640 * Build when there is a code change
     641 * Can define multiple builds and hierarchies of builds
     642 * Provides a dashboard of current and past results
     643 * Must be done outside SF
     644
     645Coding standards and development guidelines
     646 * Good documents from Luis but nothing to enforce them
     647 * Should start at running [http://perlcritic.com Perl::Critic]: highly configurable, large set of code checks, possibility to include project specific requirements
     648
     649Unit tests:
     650 * Having a good set of unit tests makes refactoring and  changing code more reliable
     651 * Many choices for framework for Perl: Perl's own may be easiest
     652 * Need to implement in a way that many tests come along for free
     653 * Not easy for NCM components as the test should include the action done: need to add ability in underlying libraries to use another root
     654   * May be used for other purposes like configuring images
     655
     656Documentation: something like Perl::Tidy may help to generate documentation in a format easily usable in a central place
     657 * Can tidy perl code extracting POD documentation and producing HTML pages
     658
     659Quattor build tools: contain many thing relating to code checking that should be moved to more appropriate tools
     660 * Try to design a new generation of QBT starting with the core features of the current ones (probably tagging releases) and adding later checks with appropriate tools
     661
     662
     663== Future Features ==
     664
     665Working groups of people collaborating in developments of new features, before a final, complete release
     666 * Monitoring, visualization/datawarehousing, virtualization, messaging infrastructure...
     667
     668Template viewer: evaluate how to move forward based on panc output
     669 * tplview still based on direct parsing of pan templates
     670
     671
     672== Conclusions ==
     673
     674Decisions:
     675 * Software contribution and fixes: code maintenance is a common responsability, everybody is allowed to commit a change in any component
     676   * No need to get permission from the maintainer
     677   * Maintainer is a person with a global overview able to help other peoples with their contributions and who may rev iew the changes
     678
     679Actions:
     680 * Merge monitoring nagios3/ templates into the main nagios/ templates: Guillaume/Michel
     681   * Short term: remove nagios3/ from branches
     682 * Migrate mailing lists to SF
     683 * Setup of Trac on SF: Michel
     684 * Migration of QWG/SCDB to SF: Stephen/Michel, medium priority
     685 * Roadmap and strategy for consistent naming of command names, config files...
     686   * Perl module namespace: N. William
     687   * Command names: Stephen, use `quattor-` as a common prefix
     688   * Config files: Michel, move to `/etc/quattor`
     689 * Proposal for a new generation QBT : Cal
     690 * Evaluate possibility of a chat room in SF for support and other real-time discussions
     691 * Software tools:
     692   * Autobuild: Eric to look at Hudson and other continous build system
     693   * Perl::Critic: Cal
     694   * Test frameworks: Nick
     695   * Perl::Tidy or similar tools: Stephen
     696 * Review "dummy WN" hack and try to reimplement in a more sensible way: Stephen
     697 * Prepare FP7 project: coordinated by Cal, expression of interest by others
     698 * Make schema more modular if possible: Cal
     699
     700Next workshop:
     701 * Proposal: Brussel for the next one, Greece for winter 2010
     702   * Fix the dates with a Doodle as soon as possible, target: end of October
     703 * Factor-out specific topics, like gLite templates (1/2 day before or after)
     704 * // sessions for specific working groups or topics
     705   * Plan 1 for FP7 project
     706 * Consider day or 2 tutorial
     707   * Co-located to allow contacts between developpers and users ?
     708   * Tutorials at LISA or same
     709 * Announce widely
     710 * Fill agenda early: name programe chairs, identify main topics
     711
     712
     713
     714