wiki:Meetings/Workshops/20091104

8th Quattor Workshop -Bruxelles - 4-5/11/09

Agenda.

Site Reports

GRIF - G. Philippon

6 subistes, 31 clusters: 750 machines

  • Mainly grid machines: 100 nongrid
  • 2 subsites managing non grid machines
  • ~20 persons involved in Quattor

Main changes:

  • Use of dummy WNs: /2 for full recompile (/5 on the largest cluster)
  • First attempts to use checkdeps for pre-deployment RPM checks: successful so far
  • Errata deployment with last features in QWG
  • No more recompile if only addition/removal of a RPM in a repository

Nagios:

  • Issues with QWG templates for Nagios and ncm-nagios
  • Added host and host group support on ncm-nagios: to be commited...

CNAF - A. Chierichi

Migration to SCDB done!

  • Smooth migration
  • Started with a pilot last summer
  • Some effort needed to convince users: mixed feedback... see dedicated presentation

Quatview now working: depends on SCDB...

New templates for partitioning HDs

Pb with Ethernet bonding but apparently not supported by standard templates

  • Remark: Probably a matter of documentation

Monitoring: still rely on Lemon for storage, also some attempts with Nagios

  • Lemon templates in QWG seems to be not backward compatible: reverted back to original templates
  • Same pb with QWG templates for Nagios but didn't succeed to reproduce the current configuration

SCDB: 1 instance, 19 clusters, 13 users

  • Grid services configured with YAIM
  • OS configured with CNAF-specific templates

Virtualization of WNs in production, based on KVM.

  • All VM managed by Quattor
  • VMs treated as standard machines

MS - N. Williams

Main changes is numbers: 2x as many nodes, now 15K managed by Quattor

  • Total compile time: 10mn
    • Now use SCDB ant interface to panc
    • Some tweaking of panc required, should come mainstream
  • 2 sets of 6 boot servers
  • AII performance (notify) is the pb: 3h for all nodes

Monitoring based on Lemon.

  • 2 servers

Pending for open-source release:

  • AII, CCM patches
  • FUSE interface to profile on clients: allow to browse the configuration, ncm-query replacement
    • Ready, looking for volunteers
    • Allow to browse a specific revision of a profile
  • Aquilon (already given to RAL): should be part of QUEST

AUTH - C. Triantafyllidis

New sysadmins involved with Quattor

Merge in progress from AUTH-specific cfg/local to SCDB sites

  • Almost done

Ganglia configuration based on SCDB clusters

  • Use ncm-filecopy: plan to improve

Virtualisation with Xen and QWG templates from 1 year ago.

New machine types:

  • Hydra committed to QWG
  • Working on core machine types

Quattor Asset Database: another Quattor db viewer

  • See presentation

Monitoring based on Nagios + OAT

CERN - V. Lefébure

Several instances based on CDB

  • Main instance: 7800 profiles

CDB 2.2.0-3

  • New CDBChangeTracker: statistics about CDB (number of commits, transaction log message, link with incidents...)
  • grep-like feature for cdbop
  • panc 8.2.10
  • Compilation in 6mn

Virtualization based on Hyper-V

  • Clients are Quattor-managed
  • Will go to production Nov. 2009

lxcloud: tests with OpenNebula and PlatformVMO

  • Currently Xen, will look at KVM
  • Quattor used to manage images

Quattor schema extended to allow more HW types to be described in CDB

  • Used for HW inventory: includes HW not managed by Quattor
  • Not yet committed:
    • Remark from Michel: should be done to avoid a fork of the schema, should not be a major pb as long as this is additions

Related activities

  • SMS: duration option added
  • Lemon: work with MS on federated Lemon instance support, improving the schema to address perf issues
  • SINDES proted to SLC5

CLUMAN: tool to display clusters and act on a selected subsets

  • Based on CDBSQL + Lemon
  • Contact: Marian.Barbik@…

TCD and Grid Ireland - D. O'Callaghan

2 TCD sites + 17 Irish sites

  • + 3 test sites
  • 500 machines
  • 18 Quattor servers
  • 55 virtual hosts + 190 VMs

SCDB

  • Added support for Bazaar client for supporting personnal branches
  • Switch back from svn:externals to a copy for QWG templates

Nagios: not much work since 1 year

Starting to rename Quattor client commands (ncm-ncd, ccm-fetch, ncm-query...) as subcommands of quattor command

New services Shibboleth, RT, Boinc

  • Using filecopy

NIKHEF - R. Starink

No major changes in numbers

  • +1 admin: now 6
  • +175 hosts: now 475
  • Using SCDB 2.3.1 + local changes
  • Virtualization based on Xen: 12 hosts, 48 guests

Monitoring based on Nagios

  • 1 master, 3 slaves
  • QWG-ish hierarchy, NCG generator for grid probes

Trying to move to a better compliance with QWG

Virtualization: thinking at OpenNebula, KVM

  • BiG Grid investigating WN virtualization

Philips - Serge Vrijaldenhoven

No major changes: 6 clusters, 200 machines

  • Want to virtualize service nodes: no plan for WNs
  • New people involved in Quattor, Serge acting as backup: still a steep learning curve

Philips would like to host a workshop in the future.

RAL - D. Ross

Started with Torque/MAUI server: needed replacement. Fairly smooth transition.

  • Some problems: RAL using a standalone Torque server, a few differences in Torque/MAUI config

WNs: push to deploy new CPUs and to deploy SL5

  • Moved 90% of our capacity to gLite 3.2/SL5 with Quattor
  • Problems: Castor client configuration missing, pbs_mom crashing on lots of node (version/arch inconsistency)
    • Also jobs dropped during WN reconfig: not clear why. Pb when executing ncm-chkconfig?

Current work:

  • UI: largely done
  • BDII
  • SinDes
  • Disk servers: move from Puppet being discussed
  • Castor service nodes: probably more tricky

9 persons involved with Quattor: after 6 weeks in production and 60% of T1 under Quattor control, benefits of Quattor beginning to be felt

  • Also some issues with VO configuration: what's done in QWG not necessarily matching what we were doing... nothing insurmountable

LAPP - E. Fede

1 quattor server with SCDB 2.3.2

  • 150 nodes: 4mn for a full build
  • Quattor server running in a VM

Running nightly autobuild for Quattor components.

  • Build everything that has a 'make rpm' in its directory
  • Report linked in http://quattor.org
  • Not yet alarming capabilities in case of errors
  • Future of this tool highly connected to QUEST

Core Components

Pan Compiler Update - C. Loomis

4 bug fix releases since last workshop for v8.2

  • v7 highly deprecated: nobody seems to use it anymore

Current work:

  • Entitlements/Authorization: on hold, lack of time
  • Internationalization: progressing, done when modifying files for other purposes
  • Performance enhancements from MS to be integrated
  • Eclipse editor: preliminary work being done at the LAL
  • Eclipse debugger: will be part of QUEST but some preparatory work being done

Planned changes:

  • Incompatible change to dependency file needed: full support for file_contents() and exists() dependencies only possible with the new format
    • Needs simultaneous release of ant task: are there other tools which could be affected
    • Will be done in 8.4.0 as it seems no other tools are affecte
  • Contemplating change from ant to maven2 for building panc
    • Ant build files are extremely complex and external handling not very nice, maven2 would be simpler and allow easier inclusion/upgrades of external dependencies

Other requests:

  • Ronald: get some information about the template tree processed in case of error

No version 9 planned right now: mainly removal of deprecated syntax

  • Concentrate on component cleanups to upgrade deprecated syntax

SCDB Update - M. Jouvin

See slides.

CNAF migration to SCDB - A. Chierici

Migration rather smooth, minor problems mainly due to the need to reorganize a few things

  • Also the problem with the buggy apr rpm distributed by GRIF during a couple of days

Quattor server migrated, not reinstalled

  • SCDB installed in with CDB: clusters migrated one after another

The existing Quattor front-end has been kept

  • In fact this is the main point of administration because of performances of Eclipse on desktops

Used profile cloning: huge speed-up

Cluster in SCDB much easier than in CDB

  • Easy to move a node from cluster to cluster

Feedback from users better than expected

  • svn already known to many users
  • A wiki was prepared in advance and most people seemed to have read it
  • Not too much problem with the compile/deploy workflow

Main complaints:

  • ant seems to demand too many resources (on laptops mainly, connected with Eclipse)
  • Users don't like to recompile changes made by others after svn update
    • NIKHEF has a modification to remove this requirement but at the price of deployment errors

QWG Templates Update - M. Jouvin

See slides.

Generic Machine Types - C. Triantafyllidis

Several machines types like web servers used at many place: may benefit from a shared effort.

Some specificities:

  • Do not depend on gLite: not reason to have them in these branches
  • Depend a lot on OS rpms

Based on machine-types/core: attempt to reduce the base RPM configuration to the minimum

Several types currently worked on:

  • web_server: Apache + main modules, filecopy used to configure them
  • db_server: currently a mysql server
  • nfs_server: move from gLite

Need to be able to combine them: not really possible with machine-types, even with gLite.

  • Misunderstanding about machine-type: there are the combination of a base OS configuration + 1 feature
  • Put features in standard/features

Need to discuss how to enforce annotations for high-level services (features) and produce some tools to easily access this information.

Other Developments

Quattor Asset Database - C. Triantafyllidis

AUTH needed a tool to ease everyday tasks for new sysadmins and allow to feed other tools with Quattor data

  • Eg. configure a monitoring system from Quattor

QAD main characteristics:

  • Written in Ruby
  • A few schema extensions: /monitiroing, network "parent" (switch it depends on), rack
    • Need to be merged with CERN similar extensions
    • /monitoring is an alternative to the LEMON /monitoring
  • Not yet decided: access to SVN? read or write?
    • Currently using a post-commit hook to synchronize QAD db with SCDB deploys: impact on performance
    • Would like to remove this in the future and rely on XML but some high level information is currently missing: need to extend the schema
  • Allow to change some node characteristics: OS, IP... and redeploy
  • From the config db, can list nodes attached to a specific switch: rely on configured information
  • Can list all the VMs attached to a host (using the enclosure definition)

Need to make progress on using CDB2SQL back-end for producing/maintaining db and use it in all tools that need a build a db from the XML

  • Rename XML2SQL
    • CERN will provide an update on the status and the best version to start from
  • May look at document-oriented dbs, liek CacheDB

Remote Configuration with Quattor - N. Williams

Disclaimer: doesn't currently work with SCDB (because of dependency over CDB notification).

Goal: apply Quattor benefits to boxes that cannot run Quattor client by using delegation

  • A Linux box will acts as a delegate where a configuration module will execute appropriate configuration commands
  • Combination of AII and and CCM/NCM/NCD

Current implementation:

  • quattor-remote-dispatcher (QRD): a tool running on Linux box and receiving CDB notification messages. It acts as a replacement for listend, cdispd, ccm-fetch, ncm-ncd
    • Configuration allows to define which part of the configuration are listened and what is the command to run
    • Can use different commands on different sets of nodes
    • Can define constraints on some part of the configuration to do different things based on some configuration state (for example state=build or production)
  • quattor-remote-configure: an AII equivalent allowing to produce a new configuration for the managed box and notify the remote dispatcher
    • Configuration of the managed device is under /software/components
    • The actual component to use is defined by /system/components/namespace: default is NCM::NCD:: but can be defined to something specific to a device, eg. ESX, Netapp...

Used to manage virtualization in Aquilon (for a WMware Hypervisor)

  • VMs are not associated with actual hosts (handled for example by VMware) but with clusters
    • A cluster is a group of hosts running an hypervisor. There is an object template for each cluster.
    • Each virtual host (machine running an hypervisor) has also an object profile that allows its configuration out of the box
  • VMs are managed as normal machines

Plan to release this as soon as it is polished but will not be able to release the specific components as they are very MS specifics and sometimes rely on non public APIs.

Virtualisation - D. O'Callaghan

Currently used Hypervisors configured with Quattor:

  • Xen: TCD, NIKHEF, CERN, AUTH
  • OpenVZ: UAM
  • VMware: MS

Also in use but not managed by Quattor:

  • KVM: CNAF
  • Hyper-V: CERN

VM cluster managers of interest: Platform/VMO, OpenNebula

Issues and requests:

  • Use libvirt, common configuration module for Linux-base hypervisors
    • Could look at what is done by Puppet or other config tools
  • Creation of images

QWG templates for basic virtualisation: mainly used at TCD (also in Senegal!)

  • Rely on ncm-xen, would be great if we had support KVM, probably using libvirt

Integration with Monitoring Systems - C. Triantafyllidis

Porblem: current Nagios configuration in QWG relies on everything being described (in particular probles) in the configuration. But EGEE and OAT developped NCG to do it more dynamically on a specific node. How to take advantage of this?

NCD: generic tool to define Nagios configuration based on context

  • 2 main basic modules/entry points: NCG::SiteSet and NCG::SiteInfo::
  • Also several internal modules to define probes to use, ... that would benefit to receive information from Quattor
    • Exemple: configure all probes for a CE if the node is configured as a CE

Integration between NCG and XML doesn't scale as it is far too long. Need to pre-process data and this is done with QAD.

  • Currently more a proof of concept than a ready-to-use tool

Discussion:

  • Need to figure out how to use NCG to define services without defining hosts and rely on Quattor for host definitions of hosts managed by Quattor
  • Create a small working group with a specific mailing list: quattor-monitoring
  • TCD will commit their change to the generic templates: no specific change
  • NIKHEF will document the change they had to make to generic templates to identify additional customizations needed
  • CNAF: currently the storage group has its own way of configuring Nagios with filecopy, difficult to change even if convinced
  • RAL interested but probably not in the short term
  • ULB/BEGrid started to look at it
  • Guillaume will fix the Nagios example in QWG repository
  • Potential scalability pb if nagios server profile need to depend on all the profiles for all the machines it monitors

QUEST Proposal Status - M. Jouvin

See slides.

Improving Quattor's Accessibility - M. Jouvin

See slides.

Proposed changes:

  • NCM Components renamed into Quattor Configuration Modules
    • Need to remove components: source of confusion
    • Could also keep NCM for Node Configuration Modules but will be confusing as NCM stands for Node Configure Manager
    • No change to component names themselves
  • Dummy WN"" renamed into profile cloning
    • Exact node also renamed into reference node
    • QWG variables will be updated at some point...
  • CDB: both one component in Quattor architecture and one of its implementation
    • Only the component in architecture can be changed
    • One proposal is CMDB for Configuration Management DB. Need more thinking
    • May be not really an issue anymore with CERN left as the only site using CDB...
  • Commands: wrapper for quattor commands on managed nodes and on AII server developped by David as an action after last workshop
    • https://sourceforge.net/apps/mediawiki/quattor/index.php?title=Command_Renaming
    • quattor command as a wrapper over ncm-ncd, ccm-fetch, ncm-query...
    • quattor-installer as a wrapper over aii-xxx
      • For AII, should not distinguish between local and remote (installfe and shellfe)
    • In both cases, symlinks created for each subcommands to allow to use shell completion (à la git)
    • Currently deployed by a template: David will try to build 2 separate RPMs (1 for managed nodes, 1 for AII server)

Miscellanous

SourceForge Status

QWG Trac migration on SF: 2 issues:

  • Export of DB from MySQL to SQLite
  • Use of Trac plugins that were not supported initially at SF

Michel agrees to have a new look at this in December.

When migrated, may think at migrated MediaWiki to Trac.

  • Nick agreeing to review MediaWiki contents after QUEST submission

Source structure

Current use of .cin leading to many problems as the files are not recognized by editors and other consequences.

MS experienced editing directly the source files and putting the metadata separatly in the RPM. Will try to document as a proof of concept.

Errata deployment at MS

MS thinking at tagging repositories based on their contents (critical and non critical) and have the ability to instruct SPMA to do only the non critical upgrades outside maintenance windows.

Actions

  • Produce RPMs for command quattor and quattor-installer wrappers (David)
  • Reevaluate feasibility of QWG Trac site migration to SF (Michel)
  • Review contents of SF MediaWiki for duplication with QWG wiki, check QWG wiki is advertised enough on SF (Nick)
  • Nagios templates:
    • Create a llist for discussion bit and bytes (Michel)
    • Commit changes in the last month by TCD (David/Stuart)
    • Document missing customization points (Ronald)
  • panc 8.4.0 with enhanced dependency management (for correct support of if_exists...) (Cal)

Wrap-Up

Next meeting in Thessaloniki (Greece), March 17-19, 2010

  • 2 days over 2 like this time
  • Plan 1/2 day after the meeting for an unformal hands-on session
  • Keep site reports much shorter: 10mn by default

Last modified 13 years ago Last modified on Mar 21, 2011, 6:40:26 PM