Table of Contents
Lemon description
Lemon is a monitoring system developed at CERN. On monitored machines, agents execute sensors to measure the state of different services according to metrics. Exceptions can also be defined which are raised when the set of conditions they define are met. Metrics and exceptions are fed back to a central Lemon server, and stored, either in flat-files or an Oracle database. A web interface allows users to browse the status of machines, and to perform simple analysis tasks such as correlating different metrics or viewing the distribution of values across a set of machines.
Agent configuration
The machine-types/base
includes some basic Lemon sensors. Additional sensors and exceptions are added in the various machine types.
To enable Lemon monitoring on a node, set
variable LEMON_CONFIGURE_AGENT = true;
There are also a number of site-specific variables that need to be set:
## Name of the lemon server variable LEMON_SERVER_HOSTNAME = undef; ## Set the email address for receiving the exception notifications variable LEMON_ALARM_MAIL = undef;
Additional variables (which have defaults) can be customised if necessary:
## The port to contact the lemon server variable LEMON_CLIENT_PORT ?= 12409; ## The transport protocol used (UDP or TCP) variable LEMON_TRANSPORT_PROTOCOL ?= 'UDP';
Advanced configuration
Adding extra metrics
Extra sensors, metrics, and exceptions may be added to a particular node or node type. There is a large set available in the QWG Lemon directory (sub-directories metric, sensor, exception). For example, to add monitoring for the service myservice
you would do:
include monitoring/lemon/sensor/myservice; include monitoring/lemon/metric/myservice; include monitoring/lemon/exception/myservice;
NB some of the metrics require additional RPM(s) to be installed on monitored machines. For example, the fio
sensor, which includes monitoring of Quattor services, requires the installation of lemon-sensor-fio
which depends on perl-Time-HiRes
. These RPMs and dependencies are not currently automatically included in QWG -- this should be done.
More than one lemon server
The default setup assumes a single lemon server (variable LEMON_SERVER_HOSTNAME
).
If you want to add multiple servers, you need to add a new one to /system/monitoring/transport
(see monitoring/lemon/client/base/config
for an example).
Server configuration
The lemon server consists of an information collector and the web interface (called LRF). Lemon supports 2 main types of information storage: flat files or an Oracle database. Currently only Oracle storage is supported in QWG. However, for those without access to an Oracle installation, we provide instructions for setting up the free Oracle XE database.
First add the lemon server config:
include monitoring/lemon/server/service;
Backend configuration
Then set the storage backend:
## use OraMon or flatfile variable LEMON_BACKEND ?= 'OraMon';
Oracle / OraMon
When using Oracle as a backend, some Oracle-specific parameters need to be set (these have default values as shown):
## name of database to use variable ORAMON_ORACLE_DATABASE_NAME ?= 'XE'; ## variable ORACLE_HOME ?= '/usr/lib/oracle/xe/app/oracle/product/10.2.0/server'; ## Local installation using XE or not ## (If true, don't forget the manual post-install steps!) variable ORACLE_XE_LOCAL_INSTALL ?= true; ## Oracle user (must be created in oracle manually!) ## (this is not necessarily the same unix username that runs lemon services) variable ORAMON_ORACLE_USER ?= 'lemon'; ## Oracle password for this user variable ORAMON_ORACLE_PASSWD ?= undef;
If you are accessing an existing Oracle server, set ORACLE_XE_LOCAL_INSTALL to false and configure the Oracle TNS (example for GRIF):
variable CONTENTS_ORACLE_TNS ?= <<EOF; # tnsnames.ora Network Configuration File: oracle_service_name.in2p3.fr = (DESCRIPTION = (ADDRESS=(PROTOCOL=TCP)(HOST=real_oracle_server_1.in2p3.fr)(PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=real_oracle_server_2.in2p3.fr)(PORT=1521)) (LOAD_BALANCE=yes) (CONNECT_DATA= (SERVER=DEDICATED)(SERVICE_NAME=oracle_service_name.in2p3.fr) (FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC) (RETRIES=180)(DELAY=5)) ) ) EOF
flatfile
Nothing yet.
Web interface configuration
The web interface to lemon uses php and access to the backend.
Cluster definition
It also needs to know what machines to expect and based on their properties, how to group them. This is (for now) done with a nlist called NODES_PROPS. A basic example is
variable NODES_PROPS = nlist( escape("mon.example.com"),nlist('type','MON','monitoring','yes'), );
The name of the template that sets this variable is controlled through
variable LEMON_NODES_PROPERTIES_TEMPLATE ?= 'pro_nodes_properties';
The default value (ie the behaviour if it's not defined) for the 'monitoring'
is controlled through
variable LEMON_NODES_PROPERTIES_DEFAULT_MONITORING ?= 'yes';
The actual configuration files for the web interface are generated in the template referenced by LEMON_SERVER_WEB_CONFIG
. The default template is monitoring/lemon/server/web
, which uses the NODES_PROPS list to build cluster definitions. The default is to define one cluster for each unique node type found in NODES_PROPS.
Superclusters
It is often useful to define "superclusters" that aggregate together existing node types. For example, a GATEWAY
supercluster might aggregate all the CE
and SE
nodes at a site. The default web configuration template supports this as follows:
variable LEMON_SUPERCLUSTERS= nlist("GATEWAY",list("CE","SE"));
NODES_PROPS example
An example used at IIHE to generate the monitoring part of NODES_PROPS
template site/lemon_nodes; ## in case of missing monitoring field variable LEMON_NODES_PROPERTIES_DEFAULT_MONITORING = 'yes'; ## manual list, is respected when autocompleting variable NODES_PROPS = nlist( escape("egon.iihe.ac.be"),nlist('type','MON'), ); ## list for order (first match is ok) variable LEMON_PROPS_REGEXP_TYPE = list('WN','SE_DISK','CE','NFS'); variable LEMON_PROPS_REGEXP_MAP = nlist( 'MON','XXXXX', 'WN','node', 'SE_DISK','behar', 'CE','gridce', 'NFS','fileserv', ); ### autocomplete this list based on DB_MACHINE and regexp variable NODES_PROPS = { tmp = NODES_PROPS; dbm = DB_MACHINE; ok = first(dbm, k, v); while (ok) { if (exists(NODES_PROPS[k])) { ok = next(dbm, k, v); } else { mach = unescape(k); mach_to_use = mach; if (LEMON_SHORTHOSTNAME) { m = matches(mach,'([^\\.]+)(\..*)?'); mach_to_use = m[1]; }; regs_order = LEMON_PROPS_REGEXP_TYPE; ok2 = first(regs_order, k2,v2); while (ok2) { if (exists(LEMON_PROPS_REGEXP_MAP[v2])) { reg = LEMON_PROPS_REGEXP_MAP[v2]; if (match(mach,reg)) { tmp = merge(tmp,nlist(escape(mach_to_use),nlist('type',v2))); ok2 = false; } else { ok2 = next(regs_order, k2,v2); }; }; }; ok = next(dbm, k, v); }; }; return(tmp); };
Server post-install
LRF/php
- Edit
/etc/php.ini
register_globals = On memory_limit = 32M register_long_array = on
- Restart Apache
/etc/init.d/httpd restart
Configure Oracle-XE
- Run the configure command. Just put the same values that you have already set in your QWG templates.
$ /etc/init.d/oracle-xe configure Oracle Database 10g Express Edition Configuration ------------------------------------------------- This will configure on-boot properties of Oracle Database 10g Express Edition. The following questions will determine whether the database should be starting upon system boot, the ports it will use, and the passwords that will be used for database accounts. Press <Enter> to accept the defaults. Ctrl-C will abort. Specify the HTTP port that will be used for Oracle Application Express [8080]: Specify a port that will be used for the database listener [1521]: Specify a password to be used for database accounts. Note that the same password will be used for SYS and SYSTEM. Oracle recommends the use of different passwords for each database account. This can be done after initial configuration: Confirm the password: Do you want Oracle Database 10g Express Edition to be started on boot (y/n) [y]: Starting Oracle Net Listener...Done Configuring Database...Done Starting Oracle Database 10g Express Edition Instance...Done Installation Completed Successfully. To access the Database Home Page go to "http://127.0.0.1:8080/apex"
From now you have access the Database Home Page by either directly running a webbroswer on the lemon box or from another machine using tunneling
Example (from your laptop):- Run this ssh command
ssh -L 18080:localhost:8080 root@lemon.box
- Then go to "http://localhost:18080/apex"
- Run this ssh command
Create the Oracle lemon user
- On the Database Home Page, login as system user.
- Go to Home>Administration>Manage Database Users>Create Database User.
- Create a lemon user (use same password as the one you have set in the your QWG template).
- Give it all the privileges.
Initiate Databases
- Set the Oracle environment:
source /etc/lemon/lemon-ora.admin_env.sh
- Create databases:
$ sqlplus system@XE SQL*Plus: Release 10.2.0.1.0 - Production on Mon Nov 19 15:05:58 2007 Copyright (c) 1982, 2005, Oracle. All rights reserved. Enter password: Connected to: Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production SQL>create tablespace LEMON_INDX logging datafile '/var/oracle/lemon_indx.dbf' size 500m autoextend on next 32m maxsize 2048m extent management local; SQL>create tablespace LEMON_DATA logging datafile '/var/oracle/lemon_data.dbf' size 1000m autoextend on next 32m maxsize 2048m extent management local; SQL>exit;
- You can now initialize databases
lemon-ora.admin --file=/etc/oramon-server.conf --create-schema lemon-ora.admin --file=/etc/oramon-server.conf --all
Make sure that the version of the cx_oracle package you get is compiled against the same version of Oracle you've installed. For Oracle 10g you can use cx_Oracle-4.3-10g-py24-1.i386.rpm available via rpmfind. If you install the wrong version of cx_oracle you may get errors like this:
unable to import Oracle API: libclntsh.so.9.0: cannot open shared object file: No such file or directory
- Add this line in /etc/init.d/lemonmrd:
. /etc/sysconfig/httpd
afterexport PYTHONPATH
- Start services:
/etc/init.d/OraMon start /etc/init.d/lemonmrd start
OracleXE + LRF only
- Remove following functions from
/var/www/html/lrf/oracle.inc
:batchUpdate batchUpdateNC batchQuery