Changes between Version 2 and Version 3 of Meetings/Workshops/20120320


Ignore:
Timestamp:
Mar 21, 2012, 6:58:57 PM (14 years ago)
Author:
/O=GRID-FR/C=FR/O=CNRS/OU=LAL/CN=Michel Jouvin
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Meetings/Workshops/20120320

    v2 v3  
    77
    88
    9 
    109== Community News ==
     10
     11No new sites that we are aware of in the last six months.
    1112
    1213CERN looking at Puppet for their new infrastructure and possibly as a Quattor replacement on the future
     
    157158 * GPT used for partition tables, using `parted `
    158159  * Anaconda doesn't die with large disks, much faster
     160  * Anaconda doesn't allow to reinstall changing some partitions and protecting others...
    159161 * Built with Maven tools
    160162 
     
    210212 
    211213 
    212  
     214=== IPv6 readiness - R. Starink / P. Bernabé ===
     215
     216Evaluation work done at NIKHEF recently: goal was to identify what was needed to be changed
     217 * Focus on mixed IPv4/v6 setup
     218 * Backward compatibility
     219 
     220Test cluster
     221 * Isolated from the world
     222 * OS: CentOS 5.5 64-bit
     223 * A dedicated Quattor server
     224 * Clients: basic OS to start, grid services later
     225 
     226Quattor server
     227 * PXE only supported for IPv4: mixed setup will be required to handle installation
     228 * As long as IPv4 is there, no change required on the server!
     229 
     230ncm-network
     231 * Schema: basic support for IPv6 but no support for mixed setup
     232   * Duplicate IP, gateway...
     233   * Some IPv6 enabling fields
     234   * How to specify IPv6 addresses: configure separately prefix and local part? What to bind prefixes to (interfaces? node?)?
     235 * Autoconfigure support?
     236   
     237QWG changes
     238 * Allow enabling IPv6 globally or per interface
     239 * A new table for IPv6 addresses with a more flexible layout allowing to handle multiple addresses per node
     240 * IPv6 parameters: per network?
     241 
     242ncm-iptables
     243 * A few mods already made to enhance patterns for IPv6 and add protocol options
     244 
     245Other components: difficult to predict needed changes
     246 * Service specific probably?
     247
     248Conclusion (next steps)
     249 * Focus on schema for IPv6 description: ensure enough flexibility to cover different use cases without too much unnecessary complexity
     250   * Ask Paco if he can contributes
     251 * Ask IPv6 experts (in particular in HEP community) to assess the proposed schema
     252 
     253== Aquilon  and other MS Tools ==
     254
     255Current status
     256 * AII, CCM mods released thanks to Luis
     257 * ncm-network changes contributed
     258 * ncm-sysctl not yet modified: waiting for escape/unescape removal
     259 * quattor-remote-configure available but quattor-remote-deployer not yet released as it is being rewritten
     260 
     2615 core developpers, many support engineers
     262 * Current focus is to allow more operations without editing templates as Aquilon usage extended to less standard machines (non "grid")
     263 * Plan is to add more commands into Aquilon that will deal with more advanced operations
     264 
     265Quattor Remote Deployer (QRD) status: equivalent of ncm-ncd to manage non Linux devices
     266 * Python rewrite of the old Perl code
     267 * QRD relies on Quattor Remote Configure (QRC) to execute components: QRC already released
     268 * ython rewrite requires a Python CCM API
     269 
     270Aquilon changes and main developments
     271 * More changes made by Aquilon commands and less template editing, e.g. assign specific features to an existing personnality
     272   * Make easier for users to make trivial changes
     273   * Reduce the risk of a big mistake by exposure of templates and the right to modify them
     274 * HW features: generic description of HW features implemented using vendor tools and their specific options
     275   * 1 template to enable/disable a feature including a vendor/model specific template definining the required configuration
     276   * Under namespace `features/hardware`
     277 * Added description of clusters (VMware clusters, high availability clusters): members (e.g. assignment of VMs to an ESX clusters), reboot constraints, file system descrption allowing attaching remote file systems...
     278   * Will add possibility to assign a VM to a single host (e.g. running KVM) rather than an ESX cluster
     279 * Planning to add support for defining service address (moving with VMs or shared by several) in Aquilon
     280 * Request to model Windows cluster in Aquilon
     281   * But real use may require a GUI...
     282 * Working on modelling network switches and configuring them QRD
     283   
     284Aquilon appliance
     285 * Still on SourceForge but not updated... lacking templates to be useful
     286   * Looking for "pop-up" installations via cloud
     287 * Appliance uses Debian
     288 
     289Mac OSX support: proof of concept of other OS support in Aquilon
     290 * ncm-directoryservices to configure OpenDirectory: ncm-mcx
     291 * ncm-ncd, ccm-fetch all works
     292   * CCM uses DB_File
     293 * A bit of fixing needed in quattor-build-tools
     294 * A single package created (alpha version) and put on SourceForge: "Managed Quattor Client for OS X"
     295 
     296 
     297=== Fostering Aquilon adoption ===
     298
     299James's experience
     300 * No major problem in using QWG templates, a few minor modifications needed
     301   * James agreeing to write a short report about these...
     302 * Main difficulty is the different workflow between SCDB and Aquilon
     303 * Aquilon upgrade: DB upgrade scripts only provided for Oracle
     304   * Aquilon supports whatever DB backend is supported by SQLAlchemy
     305   * Need to maintain backend specific upgrade scripts: PostgreSQL seems the most important to support
     306   
     307Appliance work required to allow early adopters to look at it
     308 * Update Aquilon version
     309 * Add QWG templates
     310 * Add ability to check out templates from appliance
     311   
     312RAL will have a summer student to work on Aquilon in July for 3 monthts
     313 * Try to get the new appliance ready by then.
     314 
     315=== Quattor Remote Deployer experience at MS ===
     316
     317Goal: manage non Linux devices that offer some sort of API for remote management
     318 * E.g. ESX clusters, switches, file appliances...
     319 * Handle iniitial configuration and reconfiguration
     320 
     3212 parts in the system
     322 * QRD itself: equivalent of ncm-cdispd
     323   * Receives profiles and decide what to do
     324   * Uses a plugin for either installation (aii, configuration of the boot server) or for configuration (QRC)
     325 * Quattor Remote Configure (QRC) : equivalent of ncm-ncd
     326   * 1 framework and specialized components
     327   
     328Choice of distributed (redundant) QRD/QRC configuration
     329 * Main goal is redundancy: multiple source of the configuration easy to implement with http
     330   * Every QRD receives the notification and they use a lock to ensure only one is really execute the configuration actions
     331   * Shared data (NFS) between all instances for locks and cached profiles... not ideal
     332   * Modified CAF::Lock to implement NFS locks
     333 * If possible, achieve better scalability but not really easy because of potential locking issues that may negatively impact the performances
     334 
     335In QRC, implemented conditional execution of dependencies based on the fact they have a change in their configuration...
     336 * Check whether it makes sense to feed back into ncm-ncd
     337 
     338QRD
     339 * Different modes and connectors implemented by plugins that have their own (simple) configuration
     340 * Analyze of the work to be done : multiple configuration changed notifications merged in one action
     341   * Does everything needed to move to the last config successfully deployed to the new one
     342   * Keep track of the plugin status to decide if a config was succesfully applied
     343   
     344QRD improvements needed
     345 * Too much locking that may lead to a configuration deployment being postponed indefinitely
     346   * Locking + grouping is overcomplicated and 2 levels of locking are probably unnecessary
     347   * IO performance problem with NFS
     348 * Lack of visibility of action progress: difficult to troubleshoot why a request is not executed
     349 * Impossible to execute an action without a CDB notification (profile change): no dry run, no possibility to force the redeployment of a configuration without a profile change...
     350   * Requires direct use of QRC
     351 * endless retry loop: useless...
     352   * After a failure, a component will be re-run regularly, without any config change (default: 5 mn): difference with ncm-cdispd
     353 
     354Current status
     355 * In production at MAS for 1 1/2 year
     356 * Known problems are well identified
     357 * Rewrite in progress but not yet ready
     358 
     359 
     360== QWG Templates ==
     361
     362Input based on RAL work.
     363
     364=== YAIM Support ===
     365
     366Goal: allow to reuse RPM lists for grid services as they exist in QWG but use ncm-yaim rather than standard QWG configuration to do the actual service configuration
     367 * BDII as a proof of concept
     368 
     369Proposal
     370 * Define a variable `USE_QWG_CONFIG` to select the configuration variant
     371 * Create for each service 2 namespace  `qwg` and `yaim`
     372 * Rename current  `service.tpl` into `qwg\config.tpl`
     373 * Create a new  `service.tpl` that will include RPMs and acts as a switch between both variants
     374 
     375Open questions
     376 * Account configuration: with YAIM or with  `ncm-account`?
     377   * Try to support both and compare
     378   
     379=== RPM list management ===
     380
     381Would be desirable to manage separately the RPM list and the RPM version to use
     382 * RPM lists can be generated from an OS distribution
     383 * Would make easier the maintenance of templates in config/os: should not require a change with new versions as long as the RPMs are the same.
     384 
     385Need to improve the XSL stylesheet processing the distribution comps.xml to produce the information in this new format.
     386 * Template defining default version will be included at the beginning of every RPM template for a non disruptive migration
     387
     388
     389== SL6 issues ==
     390
     391ncm-network changes to manage udev made by MS but not yet contributed back.
     392
     393grub/spma inconsistency with new kernel names including architecture: fixed by last `spma` (-12).
     394
     395ldapauth configuration completely changed
     396 * New configution supported through a new configuration subtree in the component
     397 
     398SPMA and SELinux: SPMA requires a new SELinux effort to work
     399 * Added by last version in SF repository
     400 
     401
     402== Monitoring ==
     403
     404=== Experience with Nagios and Icinga at UGent ===
     405
     406Main goal: dynamic reconfiguration of monitoring
     407 * Currently done by a Python framework parsing the XML profiles to produce the required monitoring configuration
     408   * At UGent, everything is described in Quattor
     409   * The framework produces a template that is used by the Icinga server: currently no automatic reconfiguration of the monitoring server
     410   * Current QWG approach relies on a static definition of groups in a specific template and is difficult to keep in sync with the actual config: at least the risk of discrepancies
     411 * Plan to handle this with Aquilon in the future
     412   * Unclear if it will really impact the overall workflow
     413 
     414Host groups a host belong to are defined in the Quattor configuration
     415 * Processed by the Python framework to generate the appropriate monitoring configuration
     416
     417ncm-icinga: written from scratch
     418 * Schema different from ncm-nagios
     419 * Configuration information required (variables) is basically the same
     420
     421Status
     422 * Doing last tests
     423 * Production targeted end of April
     424 * Want to look at Aquilon this summer
     425 
     426QWG Todo: document the variables related to monitoring and common to both Nagios and Icinga
     427 * Already some information on the Trac wiki
     428 
     429
     430== Configuration Modules ==
     431
     432Configuration module status report
     433 * State file managed/generated by ncm-ncd and ncm-cdispd in last versions
     434   * Requires a 'state' drective defining the state directory in the ncm-ncd and the ncm-cdispd configuration file
     435   * Presence of the file denotes a component that should run and indicate the error if there was one in a previous run
     436 * TODO: add an option to ncm-ncd to display the configuration modules that are waiting to run and whether they experienced an error in a previous run
     437
     438Test mode: 'noaction' property should do the job if the configuration module uses CAF
     439 * Will display file to be opened, command to be executed rather than doing it
     440   
     441Would be great to keep not only the latest profile but the last one.
     442 * ccm-purge could keep a given number of profiles or all the profiles more recent that a certain time interval
     443 * ncm-query could be enhanced to list differences between profiles with several levels of details: components impacts, detailed config change for part of the configuration tree...
     444
     445Delayed execution: see [/wiki/Meetings/Workshops/20111011#ChangeScheduling Strasbourg's minutes]
     446 * No work since
     447
     448Support for other languages: desirable but not urgent
     449
     450
     451== Community Life and Development Process/Tools ==
     452
     453To be discussed on Thursday
     454 * Web landng page : move to Git
     455 * IRC
     456 * Twitter
     457 * Quattor releases
     458 * Vidyo for standup?
     459 * Fix link to Quattor home page in WallStreet Tech article about MS usage of Linux/Quattor
     460   * Or add a redirect in Mediawiki page
     461 * Cookbooks for all high level tools
     462   * Combine into one?
     463 * CERN presentation on Agile IT: what is worth as an input to Quattor?
     464 * Next workshop: UGent
     465