Changes between Initial Version and Version 1 of Doc/Monitoring/NagiosProbes


Ignore:
Timestamp:
Dec 6, 2010, 10:17:45 AM (13 years ago)
Author:
/C=GR/O=HellasGrid/OU=auth.gr/CN=Christos Triantafyllidis
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Doc/Monitoring/NagiosProbes

    v1 v1  
     1= Nagios Probes related to Quattor activity =
     2[[TracNav]]
     3
     4This page contains a list of all Nagios probes that have been developed in order to monitor Quattor activity.
     5
     6== Probe list ==
     7
     8||= Name =||= Description =||= Created by =||
     9|| check_kernel_version || Extracts the desired kernel version from ncm-query and compares it to the running kernel. It return a WARNING if they differ. We occasionally found nodes still running an old (vulnerable) kernel after deploying a kernel upgrade. This check helps to identify nodes that still need to be rebooted. || NIKHEF ||
     10|| check_ncd || It parses the ncd log files (/var/log/ncm/ncd.log*) and tries to find the latest run of NCD. The number of errors (CRITICAL) and warnings (WARNING) determine the result of the check. || NIKHEF ||
     11|| check_ncd || Does exactly the same but with different code :) || AUTH ||
     12|| check_service || a more generic script that checks if a particular service is indeed running. It wraps around init.d scripts. In the context of Quattor we run it against ncm-cdispd, to prevent nodes that don't respond to configuration changes. || NIKHEF ||
     13|| check_spma || It parses the spma log files (/var/log/spma.log*) and identifies the results of the latest run of SPMA. The SPMA result is returned as result of the check || AUTH ||