| 212 | | |
| | 214 | === IPv6 readiness - R. Starink / P. Bernabé === |
| | 215 | |
| | 216 | Evaluation work done at NIKHEF recently: goal was to identify what was needed to be changed |
| | 217 | * Focus on mixed IPv4/v6 setup |
| | 218 | * Backward compatibility |
| | 219 | |
| | 220 | Test cluster |
| | 221 | * Isolated from the world |
| | 222 | * OS: CentOS 5.5 64-bit |
| | 223 | * A dedicated Quattor server |
| | 224 | * Clients: basic OS to start, grid services later |
| | 225 | |
| | 226 | Quattor server |
| | 227 | * PXE only supported for IPv4: mixed setup will be required to handle installation |
| | 228 | * As long as IPv4 is there, no change required on the server! |
| | 229 | |
| | 230 | ncm-network |
| | 231 | * Schema: basic support for IPv6 but no support for mixed setup |
| | 232 | * Duplicate IP, gateway... |
| | 233 | * Some IPv6 enabling fields |
| | 234 | * How to specify IPv6 addresses: configure separately prefix and local part? What to bind prefixes to (interfaces? node?)? |
| | 235 | * Autoconfigure support? |
| | 236 | |
| | 237 | QWG changes |
| | 238 | * Allow enabling IPv6 globally or per interface |
| | 239 | * A new table for IPv6 addresses with a more flexible layout allowing to handle multiple addresses per node |
| | 240 | * IPv6 parameters: per network? |
| | 241 | |
| | 242 | ncm-iptables |
| | 243 | * A few mods already made to enhance patterns for IPv6 and add protocol options |
| | 244 | |
| | 245 | Other components: difficult to predict needed changes |
| | 246 | * Service specific probably? |
| | 247 | |
| | 248 | Conclusion (next steps) |
| | 249 | * Focus on schema for IPv6 description: ensure enough flexibility to cover different use cases without too much unnecessary complexity |
| | 250 | * Ask Paco if he can contributes |
| | 251 | * Ask IPv6 experts (in particular in HEP community) to assess the proposed schema |
| | 252 | |
| | 253 | == Aquilon and other MS Tools == |
| | 254 | |
| | 255 | Current status |
| | 256 | * AII, CCM mods released thanks to Luis |
| | 257 | * ncm-network changes contributed |
| | 258 | * ncm-sysctl not yet modified: waiting for escape/unescape removal |
| | 259 | * quattor-remote-configure available but quattor-remote-deployer not yet released as it is being rewritten |
| | 260 | |
| | 261 | 5 core developpers, many support engineers |
| | 262 | * Current focus is to allow more operations without editing templates as Aquilon usage extended to less standard machines (non "grid") |
| | 263 | * Plan is to add more commands into Aquilon that will deal with more advanced operations |
| | 264 | |
| | 265 | Quattor Remote Deployer (QRD) status: equivalent of ncm-ncd to manage non Linux devices |
| | 266 | * Python rewrite of the old Perl code |
| | 267 | * QRD relies on Quattor Remote Configure (QRC) to execute components: QRC already released |
| | 268 | * ython rewrite requires a Python CCM API |
| | 269 | |
| | 270 | Aquilon changes and main developments |
| | 271 | * More changes made by Aquilon commands and less template editing, e.g. assign specific features to an existing personnality |
| | 272 | * Make easier for users to make trivial changes |
| | 273 | * Reduce the risk of a big mistake by exposure of templates and the right to modify them |
| | 274 | * HW features: generic description of HW features implemented using vendor tools and their specific options |
| | 275 | * 1 template to enable/disable a feature including a vendor/model specific template definining the required configuration |
| | 276 | * Under namespace `features/hardware` |
| | 277 | * Added description of clusters (VMware clusters, high availability clusters): members (e.g. assignment of VMs to an ESX clusters), reboot constraints, file system descrption allowing attaching remote file systems... |
| | 278 | * Will add possibility to assign a VM to a single host (e.g. running KVM) rather than an ESX cluster |
| | 279 | * Planning to add support for defining service address (moving with VMs or shared by several) in Aquilon |
| | 280 | * Request to model Windows cluster in Aquilon |
| | 281 | * But real use may require a GUI... |
| | 282 | * Working on modelling network switches and configuring them QRD |
| | 283 | |
| | 284 | Aquilon appliance |
| | 285 | * Still on SourceForge but not updated... lacking templates to be useful |
| | 286 | * Looking for "pop-up" installations via cloud |
| | 287 | * Appliance uses Debian |
| | 288 | |
| | 289 | Mac OSX support: proof of concept of other OS support in Aquilon |
| | 290 | * ncm-directoryservices to configure OpenDirectory: ncm-mcx |
| | 291 | * ncm-ncd, ccm-fetch all works |
| | 292 | * CCM uses DB_File |
| | 293 | * A bit of fixing needed in quattor-build-tools |
| | 294 | * A single package created (alpha version) and put on SourceForge: "Managed Quattor Client for OS X" |
| | 295 | |
| | 296 | |
| | 297 | === Fostering Aquilon adoption === |
| | 298 | |
| | 299 | James's experience |
| | 300 | * No major problem in using QWG templates, a few minor modifications needed |
| | 301 | * James agreeing to write a short report about these... |
| | 302 | * Main difficulty is the different workflow between SCDB and Aquilon |
| | 303 | * Aquilon upgrade: DB upgrade scripts only provided for Oracle |
| | 304 | * Aquilon supports whatever DB backend is supported by SQLAlchemy |
| | 305 | * Need to maintain backend specific upgrade scripts: PostgreSQL seems the most important to support |
| | 306 | |
| | 307 | Appliance work required to allow early adopters to look at it |
| | 308 | * Update Aquilon version |
| | 309 | * Add QWG templates |
| | 310 | * Add ability to check out templates from appliance |
| | 311 | |
| | 312 | RAL will have a summer student to work on Aquilon in July for 3 monthts |
| | 313 | * Try to get the new appliance ready by then. |
| | 314 | |
| | 315 | === Quattor Remote Deployer experience at MS === |
| | 316 | |
| | 317 | Goal: manage non Linux devices that offer some sort of API for remote management |
| | 318 | * E.g. ESX clusters, switches, file appliances... |
| | 319 | * Handle iniitial configuration and reconfiguration |
| | 320 | |
| | 321 | 2 parts in the system |
| | 322 | * QRD itself: equivalent of ncm-cdispd |
| | 323 | * Receives profiles and decide what to do |
| | 324 | * Uses a plugin for either installation (aii, configuration of the boot server) or for configuration (QRC) |
| | 325 | * Quattor Remote Configure (QRC) : equivalent of ncm-ncd |
| | 326 | * 1 framework and specialized components |
| | 327 | |
| | 328 | Choice of distributed (redundant) QRD/QRC configuration |
| | 329 | * Main goal is redundancy: multiple source of the configuration easy to implement with http |
| | 330 | * Every QRD receives the notification and they use a lock to ensure only one is really execute the configuration actions |
| | 331 | * Shared data (NFS) between all instances for locks and cached profiles... not ideal |
| | 332 | * Modified CAF::Lock to implement NFS locks |
| | 333 | * If possible, achieve better scalability but not really easy because of potential locking issues that may negatively impact the performances |
| | 334 | |
| | 335 | In QRC, implemented conditional execution of dependencies based on the fact they have a change in their configuration... |
| | 336 | * Check whether it makes sense to feed back into ncm-ncd |
| | 337 | |
| | 338 | QRD |
| | 339 | * Different modes and connectors implemented by plugins that have their own (simple) configuration |
| | 340 | * Analyze of the work to be done : multiple configuration changed notifications merged in one action |
| | 341 | * Does everything needed to move to the last config successfully deployed to the new one |
| | 342 | * Keep track of the plugin status to decide if a config was succesfully applied |
| | 343 | |
| | 344 | QRD improvements needed |
| | 345 | * Too much locking that may lead to a configuration deployment being postponed indefinitely |
| | 346 | * Locking + grouping is overcomplicated and 2 levels of locking are probably unnecessary |
| | 347 | * IO performance problem with NFS |
| | 348 | * Lack of visibility of action progress: difficult to troubleshoot why a request is not executed |
| | 349 | * Impossible to execute an action without a CDB notification (profile change): no dry run, no possibility to force the redeployment of a configuration without a profile change... |
| | 350 | * Requires direct use of QRC |
| | 351 | * endless retry loop: useless... |
| | 352 | * After a failure, a component will be re-run regularly, without any config change (default: 5 mn): difference with ncm-cdispd |
| | 353 | |
| | 354 | Current status |
| | 355 | * In production at MAS for 1 1/2 year |
| | 356 | * Known problems are well identified |
| | 357 | * Rewrite in progress but not yet ready |
| | 358 | |
| | 359 | |
| | 360 | == QWG Templates == |
| | 361 | |
| | 362 | Input based on RAL work. |
| | 363 | |
| | 364 | === YAIM Support === |
| | 365 | |
| | 366 | Goal: allow to reuse RPM lists for grid services as they exist in QWG but use ncm-yaim rather than standard QWG configuration to do the actual service configuration |
| | 367 | * BDII as a proof of concept |
| | 368 | |
| | 369 | Proposal |
| | 370 | * Define a variable `USE_QWG_CONFIG` to select the configuration variant |
| | 371 | * Create for each service 2 namespace `qwg` and `yaim` |
| | 372 | * Rename current `service.tpl` into `qwg\config.tpl` |
| | 373 | * Create a new `service.tpl` that will include RPMs and acts as a switch between both variants |
| | 374 | |
| | 375 | Open questions |
| | 376 | * Account configuration: with YAIM or with `ncm-account`? |
| | 377 | * Try to support both and compare |
| | 378 | |
| | 379 | === RPM list management === |
| | 380 | |
| | 381 | Would be desirable to manage separately the RPM list and the RPM version to use |
| | 382 | * RPM lists can be generated from an OS distribution |
| | 383 | * Would make easier the maintenance of templates in config/os: should not require a change with new versions as long as the RPMs are the same. |
| | 384 | |
| | 385 | Need to improve the XSL stylesheet processing the distribution comps.xml to produce the information in this new format. |
| | 386 | * Template defining default version will be included at the beginning of every RPM template for a non disruptive migration |
| | 387 | |
| | 388 | |
| | 389 | == SL6 issues == |
| | 390 | |
| | 391 | ncm-network changes to manage udev made by MS but not yet contributed back. |
| | 392 | |
| | 393 | grub/spma inconsistency with new kernel names including architecture: fixed by last `spma` (-12). |
| | 394 | |
| | 395 | ldapauth configuration completely changed |
| | 396 | * New configution supported through a new configuration subtree in the component |
| | 397 | |
| | 398 | SPMA and SELinux: SPMA requires a new SELinux effort to work |
| | 399 | * Added by last version in SF repository |
| | 400 | |
| | 401 | |
| | 402 | == Monitoring == |
| | 403 | |
| | 404 | === Experience with Nagios and Icinga at UGent === |
| | 405 | |
| | 406 | Main goal: dynamic reconfiguration of monitoring |
| | 407 | * Currently done by a Python framework parsing the XML profiles to produce the required monitoring configuration |
| | 408 | * At UGent, everything is described in Quattor |
| | 409 | * The framework produces a template that is used by the Icinga server: currently no automatic reconfiguration of the monitoring server |
| | 410 | * Current QWG approach relies on a static definition of groups in a specific template and is difficult to keep in sync with the actual config: at least the risk of discrepancies |
| | 411 | * Plan to handle this with Aquilon in the future |
| | 412 | * Unclear if it will really impact the overall workflow |
| | 413 | |
| | 414 | Host groups a host belong to are defined in the Quattor configuration |
| | 415 | * Processed by the Python framework to generate the appropriate monitoring configuration |
| | 416 | |
| | 417 | ncm-icinga: written from scratch |
| | 418 | * Schema different from ncm-nagios |
| | 419 | * Configuration information required (variables) is basically the same |
| | 420 | |
| | 421 | Status |
| | 422 | * Doing last tests |
| | 423 | * Production targeted end of April |
| | 424 | * Want to look at Aquilon this summer |
| | 425 | |
| | 426 | QWG Todo: document the variables related to monitoring and common to both Nagios and Icinga |
| | 427 | * Already some information on the Trac wiki |
| | 428 | |
| | 429 | |
| | 430 | == Configuration Modules == |
| | 431 | |
| | 432 | Configuration module status report |
| | 433 | * State file managed/generated by ncm-ncd and ncm-cdispd in last versions |
| | 434 | * Requires a 'state' drective defining the state directory in the ncm-ncd and the ncm-cdispd configuration file |
| | 435 | * Presence of the file denotes a component that should run and indicate the error if there was one in a previous run |
| | 436 | * TODO: add an option to ncm-ncd to display the configuration modules that are waiting to run and whether they experienced an error in a previous run |
| | 437 | |
| | 438 | Test mode: 'noaction' property should do the job if the configuration module uses CAF |
| | 439 | * Will display file to be opened, command to be executed rather than doing it |
| | 440 | |
| | 441 | Would be great to keep not only the latest profile but the last one. |
| | 442 | * ccm-purge could keep a given number of profiles or all the profiles more recent that a certain time interval |
| | 443 | * ncm-query could be enhanced to list differences between profiles with several levels of details: components impacts, detailed config change for part of the configuration tree... |
| | 444 | |
| | 445 | Delayed execution: see [/wiki/Meetings/Workshops/20111011#ChangeScheduling Strasbourg's minutes] |
| | 446 | * No work since |
| | 447 | |
| | 448 | Support for other languages: desirable but not urgent |
| | 449 | |
| | 450 | |
| | 451 | == Community Life and Development Process/Tools == |
| | 452 | |
| | 453 | To be discussed on Thursday |
| | 454 | * Web landng page : move to Git |
| | 455 | * IRC |
| | 456 | * Twitter |
| | 457 | * Quattor releases |
| | 458 | * Vidyo for standup? |
| | 459 | * Fix link to Quattor home page in WallStreet Tech article about MS usage of Linux/Quattor |
| | 460 | * Or add a redirect in Mediawiki page |
| | 461 | * Cookbooks for all high level tools |
| | 462 | * Combine into one? |
| | 463 | * CERN presentation on Agile IT: what is worth as an input to Quattor? |
| | 464 | * Next workshop: UGent |
| | 465 | |