wiki:Doc/Monitoring/NagiosProbes

Nagios Probes related to Quattor activity

This page contains a list of all Nagios probes that have been developed in order to monitor Quattor activity.

Probe list

Name Description Created by
check_kernel_version Extracts the desired kernel version from ncm-query and compares it to the running kernel. It return a WARNING if they differ. We occasionally found nodes still running an old (vulnerable) kernel after deploying a kernel upgrade. This check helps to identify nodes that still need to be rebooted. NIKHEF
check_ncd It parses the ncd log files (/var/log/ncm/ncd.log*) and tries to find the latest run of NCD. The number of errors (CRITICAL) and warnings (WARNING) determine the result of the check. NIKHEF
check_ncd Does exactly the same but with different code :) AUTH
check_service a more generic script that checks if a particular service is indeed running. It wraps around init.d scripts. In the context of Quattor we run it against ncm-cdispd, to prevent nodes that don't respond to configuration changes. NIKHEF
check_spma It parses the spma log files (/var/log/spma.log*) and identifies the results of the latest run of SPMA. The SPMA result is returned as result of the check AUTH
Last modified 13 years ago Last modified on Dec 6, 2010, 10:17:45 AM

Attachments (5)

Download all attachments as: .zip