wiki:Obsolete/Doc/Monitoring/Lemon

Lemon description

Lemon is a monitoring system developed at CERN. On monitored machines, agents execute sensors to measure the state of different services according to metrics. Exceptions can also be defined which are raised when the set of conditions they define are met. Metrics and exceptions are fed back to a central Lemon server, and stored, either in flat-files or an Oracle database. A web interface allows users to browse the status of machines, and to perform simple analysis tasks such as correlating different metrics or viewing the distribution of values across a set of machines.

Agent configuration

The machine-types/base includes some basic Lemon sensors. Additional sensors and exceptions are added in the various machine types.

To enable Lemon monitoring on a node, set

variable LEMON_CONFIGURE_AGENT = true;

There are also a number of site-specific variables that need to be set:

## Name of the lemon server
variable LEMON_SERVER_HOSTNAME = undef;

## Set the email address for receiving the exception notifications
variable LEMON_ALARM_MAIL = undef;

Additional variables (which have defaults) can be customised if necessary:

## The port to contact the lemon server
variable LEMON_CLIENT_PORT ?= 12409;

## The transport protocol used (UDP or TCP)
variable LEMON_TRANSPORT_PROTOCOL ?= 'UDP';

Advanced configuration

Adding extra metrics

Extra sensors, metrics, and exceptions may be added to a particular node or node type. There is a large set available in the QWG Lemon directory (sub-directories metric, sensor, exception). For example, to add monitoring for the service myservice you would do:

include monitoring/lemon/sensor/myservice;
include monitoring/lemon/metric/myservice;
include monitoring/lemon/exception/myservice;

NB some of the metrics require additional RPM(s) to be installed on monitored machines. For example, the fio sensor, which includes monitoring of Quattor services, requires the installation of lemon-sensor-fio which depends on perl-Time-HiRes. These RPMs and dependencies are not currently automatically included in QWG -- this should be done.

More than one lemon server

The default setup assumes a single lemon server (variable LEMON_SERVER_HOSTNAME). If you want to add multiple servers, you need to add a new one to /system/monitoring/transport (see monitoring/lemon/client/base/config for an example).

Server configuration

The lemon server consists of an information collector and the web interface (called LRF). Lemon supports 2 main types of information storage: flat files or an Oracle database. Currently only Oracle storage is supported in QWG. However, for those without access to an Oracle installation, we provide instructions for setting up the free Oracle XE database.

First add the lemon server config:

include monitoring/lemon/server/service;

Backend configuration

Then set the storage backend:

## use OraMon or flatfile
variable LEMON_BACKEND ?= 'OraMon';

Oracle / OraMon

When using Oracle as a backend, some Oracle-specific parameters need to be set (these have default values as shown):

## name of database to use
variable ORAMON_ORACLE_DATABASE_NAME ?= 'XE';
## 
variable ORACLE_HOME ?= '/usr/lib/oracle/xe/app/oracle/product/10.2.0/server';

## Local installation using XE or not
## (If true, don't forget the manual post-install steps!)
variable ORACLE_XE_LOCAL_INSTALL ?= true;

## Oracle user (must be created in oracle manually!)
## (this is not necessarily the same unix username that runs lemon services)
variable ORAMON_ORACLE_USER ?= 'lemon';
## Oracle password for this user
variable ORAMON_ORACLE_PASSWD ?= undef;

If you are accessing an existing Oracle server, set ORACLE_XE_LOCAL_INSTALL to false and configure the Oracle TNS (example for GRIF):

variable CONTENTS_ORACLE_TNS ?= <<EOF;
# tnsnames.ora Network Configuration File:
oracle_service_name.in2p3.fr =
    (DESCRIPTION = 
(ADDRESS=(PROTOCOL=TCP)(HOST=real_oracle_server_1.in2p3.fr)(PORT=1521))
 
(ADDRESS=(PROTOCOL=TCP)(HOST=real_oracle_server_2.in2p3.fr)(PORT=1521))
      (LOAD_BALANCE=yes)
      (CONNECT_DATA=
           (SERVER=DEDICATED)(SERVICE_NAME=oracle_service_name.in2p3.fr)
           (FAILOVER_MODE=(TYPE=SELECT)(METHOD=BASIC)
           (RETRIES=180)(DELAY=5))
     )
   )
EOF

flatfile

Nothing yet.

Web interface configuration

The web interface to lemon uses php and access to the backend.

Cluster definition

It also needs to know what machines to expect and based on their properties, how to group them. This is (for now) done with a nlist called NODES_PROPS. A basic example is

variable NODES_PROPS  = nlist(
	escape("mon.example.com"),nlist('type','MON','monitoring','yes'),
);

The name of the template that sets this variable is controlled through

variable LEMON_NODES_PROPERTIES_TEMPLATE ?= 'pro_nodes_properties';

The default value (ie the behaviour if it's not defined) for the 'monitoring' is controlled through

variable LEMON_NODES_PROPERTIES_DEFAULT_MONITORING ?= 'yes';

The actual configuration files for the web interface are generated in the template referenced by LEMON_SERVER_WEB_CONFIG. The default template is monitoring/lemon/server/web, which uses the NODES_PROPS list to build cluster definitions. The default is to define one cluster for each unique node type found in NODES_PROPS.

Superclusters

It is often useful to define "superclusters" that aggregate together existing node types. For example, a GATEWAY supercluster might aggregate all the CE and SE nodes at a site. The default web configuration template supports this as follows:

variable LEMON_SUPERCLUSTERS= nlist("GATEWAY",list("CE","SE"));

NODES_PROPS example

An example used at IIHE to generate the monitoring part of NODES_PROPS

template site/lemon_nodes;

## in case of missing monitoring field 
variable LEMON_NODES_PROPERTIES_DEFAULT_MONITORING = 'yes';

## manual list, is respected when autocompleting
variable NODES_PROPS  = nlist(
	escape("egon.iihe.ac.be"),nlist('type','MON'),
);	

## list for order (first match is ok)
variable LEMON_PROPS_REGEXP_TYPE = list('WN','SE_DISK','CE','NFS');
variable LEMON_PROPS_REGEXP_MAP = nlist(
	'MON','XXXXX',
	'WN','node',
	'SE_DISK','behar',
	'CE','gridce',
	'NFS','fileserv',
);


### autocomplete this list based on DB_MACHINE and regexp
variable NODES_PROPS = {
	tmp = NODES_PROPS;
	dbm = DB_MACHINE;
	
	ok = first(dbm, k, v);
	while (ok) {
		if (exists(NODES_PROPS[k])) {
			ok = next(dbm, k, v);
		} else {
			mach = unescape(k);
			mach_to_use = mach;
			if (LEMON_SHORTHOSTNAME) {
				m = matches(mach,'([^\\.]+)(\..*)?');
				mach_to_use = m[1];
			};
			regs_order = LEMON_PROPS_REGEXP_TYPE;
			ok2 = first(regs_order, k2,v2);
			while (ok2) {
				if (exists(LEMON_PROPS_REGEXP_MAP[v2])) {
					reg = LEMON_PROPS_REGEXP_MAP[v2];
					if (match(mach,reg)) {
						tmp = merge(tmp,nlist(escape(mach_to_use),nlist('type',v2)));
						ok2 = false;
					} else {
						ok2 = next(regs_order, k2,v2);
					};
				};
			};
		
			ok = next(dbm, k, v);
		};
	};
	
	return(tmp);
};

Server post-install

LRF/php

  • Edit /etc/php.ini
    register_globals = On
    memory_limit = 32M
    register_long_array = on
    
  • Restart Apache
    /etc/init.d/httpd restart
    

Configure Oracle-XE

  • Run the configure command. Just put the same values that you have already set in your QWG templates.
    $ /etc/init.d/oracle-xe configure
    
    Oracle Database 10g Express Edition Configuration
    -------------------------------------------------
    This will configure on-boot properties of Oracle Database 10g Express
    Edition.  The following questions will determine whether the database should
    be starting upon system boot, the ports it will use, and the passwords that
    will be used for database accounts.  Press <Enter> to accept the defaults.
    Ctrl-C will abort.
    
    Specify the HTTP port that will be used for Oracle Application Express [8080]:
    
    Specify a port that will be used for the database listener [1521]:
    
    Specify a password to be used for database accounts.  Note that the same
    password will be used for SYS and SYSTEM.  Oracle recommends the use of
    different passwords for each database account.  This can be done after
    initial configuration:
    Confirm the password:
    
    Do you want Oracle Database 10g Express Edition to be started on boot (y/n) [y]:
    
    Starting Oracle Net Listener...Done
    Configuring Database...Done
    Starting Oracle Database 10g Express Edition Instance...Done
    Installation Completed Successfully.
    To access the Database Home Page go to "http://127.0.0.1:8080/apex"
    
    From now you have access the Database Home Page by either directly running a webbroswer on the lemon box or from another machine using tunneling

    Example (from your laptop):

Create the Oracle lemon user

  • On the Database Home Page, login as system user.
  • Go to Home>Administration>Manage Database Users>Create Database User.
  • Create a lemon user (use same password as the one you have set in the your QWG template).
  • Give it all the privileges.

Initiate Databases

  • Set the Oracle environment:
    source /etc/lemon/lemon-ora.admin_env.sh
    
  • Create databases:
    $ sqlplus system@XE
    
    SQL*Plus: Release 10.2.0.1.0 - Production on Mon Nov 19 15:05:58 2007
    
    Copyright (c) 1982, 2005, Oracle.  All rights reserved.
    
    Enter password:
    
    Connected to:
    Oracle Database 10g Express Edition Release 10.2.0.1.0 - Production
    
    SQL>create tablespace LEMON_INDX  logging datafile '/var/oracle/lemon_indx.dbf' size 500m autoextend on next 32m maxsize 2048m extent management local;
    SQL>create tablespace LEMON_DATA logging datafile '/var/oracle/lemon_data.dbf' size 1000m autoextend on next 32m maxsize 2048m extent management local;
    SQL>exit;
    
  • You can now initialize databases
    lemon-ora.admin --file=/etc/oramon-server.conf --create-schema
    lemon-ora.admin --file=/etc/oramon-server.conf --all
    

Make sure that the version of the cx_oracle package you get is compiled against the same version of Oracle you've installed. For Oracle 10g you can use cx_Oracle-4.3-10g-py24-1.i386.rpm available via rpmfind. If you install the wrong version of cx_oracle you may get errors like this:

unable to import Oracle API:  libclntsh.so.9.0: cannot open shared object file: No such file or directory
  • Add this line in /etc/init.d/lemonmrd:
    . /etc/sysconfig/httpd
    
    after
    export PYTHONPATH
    
  • Start services:
    /etc/init.d/OraMon start
    /etc/init.d/lemonmrd start
    

OracleXE + LRF only

  • Remove following functions from /var/www/html/lrf/oracle.inc :
    batchUpdate
    batchUpdateNC
    batchQuery
    
Last modified 16 years ago Last modified on Jun 16, 2008, 3:03:14 PM