wiki:Doc/Monitoring/NagiosProbes

Version 1 (modified by /C=GR/O=HellasGrid/OU=auth.gr/CN=Christos Triantafyllidis, 13 years ago) (diff)

--

Nagios Probes related to Quattor activity

This page contains a list of all Nagios probes that have been developed in order to monitor Quattor activity.

Probe list

Name Description Created by
check_kernel_version Extracts the desired kernel version from ncm-query and compares it to the running kernel. It return a WARNING if they differ. We occasionally found nodes still running an old (vulnerable) kernel after deploying a kernel upgrade. This check helps to identify nodes that still need to be rebooted. NIKHEF
check_ncd It parses the ncd log files (/var/log/ncm/ncd.log*) and tries to find the latest run of NCD. The number of errors (CRITICAL) and warnings (WARNING) determine the result of the check. NIKHEF
check_ncd Does exactly the same but with different code :) AUTH
check_service a more generic script that checks if a particular service is indeed running. It wraps around init.d scripts. In the context of Quattor we run it against ncm-cdispd, to prevent nodes that don't respond to configuration changes. NIKHEF
check_spma It parses the spma log files (/var/log/spma.log*) and identifies the results of the latest run of SPMA. The SPMA result is returned as result of the check AUTH

Attachments (5)

Download all attachments as: .zip