wiki:Doc/Monitoring/NagiosProbes

Nagios Probes related to Quattor activity

This page contains a list of all Nagios probes that have been developed in order to monitor Quattor activity.

Probe list

Name Description Created by
check_kernel_version Extracts the desired kernel version from ncm-query and compares it to the running kernel. It return a WARNING if they differ. We occasionally found nodes still running an old (vulnerable) kernel after deploying a kernel upgrade. This check helps to identify nodes that still need to be rebooted. NIKHEF
check_ncd It parses the ncd log files (/var/log/ncm/ncd.log*) and tries to find the latest run of NCD. The number of errors (CRITICAL) and warnings (WARNING) determine the result of the check. NIKHEF
check_ncd Does exactly the same but with different code :) AUTH
check_service a more generic script that checks if a particular service is indeed running. It wraps around init.d scripts. In the context of Quattor we run it against ncm-cdispd, to prevent nodes that don't respond to configuration changes. NIKHEF
check_spma It parses the spma log files (/var/log/spma.log*) and identifies the results of the latest run of SPMA. The SPMA result is returned as result of the check AUTH
Last modified 7 years ago Last modified on Dec 6, 2010, 10:17:45 AM

Attachments (5)

Download all attachments as: .zip