Monitoring von Messaging-Systemen via Nagios Markus ... - netways

Monitoring von Messaging-Systemen via Nagios 

Markus Thiel 

Consultant 

m.thiel@itnovum.de 

it-novum GmbH 

Edelzeller Strasse 44 

36043 Fulda 

www.itnovum.de 

1

Inhalte 


Kurze Vorstellung 

MS Exchange 

Nagios Werkzeuge 

•Aktive checks 

•Passive checks 

•End2End - Vorgehensweise 

Monitoring 

•MS Exchange 

•Lotus Notes Domino 

•Exim 

•Postfix 

Schnittstelle Munin-Nagios 

•nsca / send_nsca 

Fragen / Anregungen 

2

Itnovum GmbH Geschäftsbereiche 

Systemmanagement 

Infrastruktur 

Optimierung 

Enterprise Content 

Management 

ERP & Business 

Intelligence 

•Open Source 

•ITCOCKPIT / Nagios 

•Storage 

Management 

•Security 

Management 

•Dokumenten 

Management 

•Archivierung 

•Enterprise Resource 

Planning 

•SAP 

•Server Virtualisierung 

•Client Virtualisierung 

•Reporting, Analyse & 

Dashboards 

•Outsourcing 

3

Big picture 

IT Prozesse 

Business Sicht 

Business Service 


Business Service 

Dashboard 

SLA-Monitoring 

Eventmanagement und Korrelation 

BP-Monitoring 

Service Lev. 

Management 

Incident- 

Management 

Problem 

Management 


CMDB 

Technische Sicht 

Alarmierung 

Schwellwerte 

Status 

Performancedaten 

E2E 


Server Netzwerke Datenbanken Middleware Anwendungen Integration 

Change 

Management 

Release 

Management 

Capacity 

Management 

Configuration 

Management 

4

Nagios – Werkzeuge 

aktive checks 

CI CI 

passive checks 

Weitere Werkzeuge 

Kommerzielle 

Tools 

Herstellerspezifische 

Tools 

Open Source 

Tools 

z.B. 

… 

5

Nagios - Aktive checks 

check _nrpe 

check _nt 

TCP/IP 

check _by_ssh 

check _snmp 

TCP/IP 

CI 

check _ldap 

check_smtp 

… 

check _tcp 

check_tcp -H $HOSTADDRESS$ -p 25 -s HELO ich -e 250 OK -q quit -w $ARG1$-c $ARG2$ 

Eigene Plugins 

6

End2End Monitoring 1 

7

End2End Monitoring 2 

Standortabhängig 

• check_ldap 

• check_tcp 

• check_pop3 

• … 

8

Exchange Monitoring - Methoden 

Exchange Server 

25 

check _nt 

Prozesse 

Perfmon 

110 

389 

… 

check _tcp 

check … 

9

Exchange Monitoring – Vorbereitung CI 

• Installation nsclient++ 

• Auslesen der Parameter aus dem Performance Counter 

(perfmon) 

10

Exchange Monitoring – Nagios Konfig 

• Installation check_nt (nagios-plugins) 

• Anpassen der Nagios Konfiguration 

Command-Definition 

check_nt -H -s \ 

-p -v COUNTER \ 

-l "\\SMTP Server(_Total)\\Gesamtzahl übermittelter Nachrichten" 

Übergabe der 

Leistungsindikatoren, 

Instanzen und 

Leistungsobjekte als 

Argument 

11


• Option A 

Konfiguration ohne Schwellwert 

12


• Option B 

Konfiguration mit Schwellwert 

wrapper um check_nt bauen; die Schwellwerte werden dabei als 

Argument übergeben 

#!/bin/bash 

… 

RETVAL=0 

TEMPFILE="/tmp/${HOST}_unzustellbarkeitsberichte" 

TIME=`date +%s` 

RES=`$PLUGINDIR/check_nt -H -s -p -v COUNTER -l \ 

"\\SMTP Server(_Total)\\Erzeugte NDRs (Unzustellbarkeitsberichte)"` 

if [ -e $TEMPFILE ];then 

ALTRES=`cat $TEMPFILE|grep "Wert ="|cut -d '=' -f2` 

ALTTIME=`cat $TEMPFILE|grep "Time ="|cut -d '=' -f2` 

else 

ALTRES=$RES 

ALTTIME=$TIME 

fi 

echo "Wert =${RES}" > $TEMPFILE 

echo "Time =${TIME}" >> $TEMPFILE 

RES=ècho "${RES}-${ALTRES}"|bc -l`; 

TIME=ècho "${TIME}-${ALTTIME}"|bc -l`; 

RES=ècho "${RES}/${TIME}*60"|bc -l`; 

if [ $RES -ge $WARN ]; then 

RETVAL=1 

fi 

if [ $RES -ge $CRIT ]; then 

RETVAL=2 

fi 

RETSTR="Erzeugte NDR pro Min : ${RES}|NDRs=${RES}NDRs_per_min;$WARN;$CRIT„ 

echo $RETSTR 

exit $RETVAL 

13


• Option B 

Konfiguration mit Schwellwert 

Pluginausgabe: 

14

Exchange Monitoring – weitere Parameter 

Erzeugte NDR 

Nicht zustallbare Nachrichten 

Angemeldete Benutzer 

Warteschlangen 

… 

Shot: MS Exchange 2007 

15

Exchange Monitoring – Prozessabfrage 

• Prozess-Status auslesen 

check_nt -H -s \ 

-p -v PROCSTATE\ 

-l STORE.EXE 

• Auszug relevanter Prozesse 

MSExchange-Informationsspeicher 

store.exe 

MSExchange-Systemaufsicht 

mad.exe 

MSExchange-Verwaltung 

exmgmt.exe 

MSExchangeRoutingModul 

inetinfo.exe 

MSExchangeMTA-Stacks 

emsmta.exe 

16

Exchange Monitoring – NagVis 

17

Lotus Notes Domino - Methoden 

QuerySet Handler 

fragt die Statistikinformationen des Server ab 

und gibt diese an den LNSNMP ab, der diese 

Informationen an den plattformspezifischen 

SNMP Agent übergibt 

LNSNMP 


Event Interceptor 

Domino-Server 


weist den LNSNMP an, zB einen snmp-trap 

abzusetzen 

18

LNSNMP 

• Unterstützte Plattformen 

z/OS 

(OS 390) 

• Nicht unterstützte Plattformen 

zSeries 

19

Lotus Notes Domino – Vorbereitung CI 

• Installation / Konfiguration SNMP 

• Installation LNSNMP 

20

LND Monitoring – Methode 1 (passiv) 

Domino-Server 

LNSNMP 


snmptrap in Echtzeit 

21

LND Monitoring – Methode 2 (aktiv) 

Domino-Server 

LNSNMP 


1152 

25 

389 

… 

check _snmp 

check _tcp 

check … 

22

LND Monitoring – passive vs aktive checks 

Anforderung Passive check Active check 

Snapshot View 

Konfigurationsaufwand 

Systemübergreifende Eventkorrelation 

Klare Servicezuordnung 

SLA tauglich 

BPM tauglich 

Differenzierung der Events beim Ausfall mehrerer Komponenten 

Tiefergehende Applikationsüberwachung 

Performancedaten / Langzeitanalyse 

23

LND Monitoring – Standard check_snmp 

• Installation check_snmp (nagios-plugins) 

• Anpassen der Nagios Konfiguration 

Command-Definition 

check_snmp \ 

–H \ 

-C \ 

-o 1.3.6.1.4.1.334.72.1.1.4.3.0 \ 

-l LN_TOTAL_MAIL_FAILURES \ 

-w \ 

-c \ 

-u Mails 

Pluginoutput 

LN_TOTAL_MAIL_FAILURES OK - 1 Mails | iso.3.6.1.4.1.334.72.1.1.4.3.0=1 

24

LND Monitoring – OIDs from MIB 

Service OID Description from MIB 

dead-mail enterprises.334.72.1.1.4.1.0 Number of dead (undeliverable) mail messages 

routing-failures enterprises.334.72.1.1.4.3.0 Total number of routing failures since the server started 

pending-routing enterprises.334.72.1.1.4.6.0 Number of mail messages waiting to be routed 

pending-local enterprises.334.72.1.1.4.7.0 Number of pending mail messages awaiting local delivery 

max-mail-delivery-time enterprises.334.72.1.1.4.12.0 Maximum time for mail delivery in seconds 

router-unable-to-transfer enterprises.334.72.1.1.4.19.0 Number of mail messages the router was unable to transfer 

mail-held-in-queue enterprises.334.72.1.1.4.21.0 Number of mail messages in message queue on hold 

mails-pending enterprises.334.72.1.1.4.31.0 Number of mail messages pending 

replicator-status enterprises.334.72.1.1.6.1.3.0 Status of the Replicator task 

router-status enterprises.334.72.1.1.6.1.4.0 Status of the Router task 

databases-in-cache enterprises.334.72.1.1.10.15.0 The number of databases currently in the cache. Administrators should 

monitor this number to see whether it approaches the 

NSF_DBCACHE_MAXENTRIES setting. If it does, this indicates the cache is 

under pressure. If this situation occurs frequently, the administrator should 

increase the setting for NSF_DBCACHE_MAXENTRIES 

25

LND Monitoring – OIDs from MIB 2 


messages-send enterprises.334.72.1.1.4.2.0 Number of messges received by router 

messages-routed enterprises.334.72.1.1.4.4.0 Total number of mail messages routed since the server started 

router-messages-attempted-totransfer 

enterprises.334.72.1.1.4.5.0 

Number of messages router attempted to transfer 

delivered-mail-size-avg enterprises.334.72.1.1.4.11.0 Average size of mail messages delivered in bytes 

delivered-mail-size-max enterprises.334.72.1.1.4.14.0 Maximum size of mail delivered in bytes 

total-mail-transferred enterprises.334.72.1.1.4.18.0 Total mail transferred in kilobytes 

transferred-per-min-peak enterprises.334.72.1.1.4.27.0 Peak number of messages transferred 

… 

MemAllocProcess enterprises.334.72.1.1.9.2 Total process-private memory allocated by all currently-running 

processes. 

DriveFree enterprises.334.72.1.1.8.3.1.4 The amount of free space left on this drive in kilobytes. 

A value of zero may indicate the statistic's value is 

too large to be passed via SNMP. 

26

Lotus Notes Domino – Dienste checken 

• Installation check_lotus_notes_services plugin * 

• Auslesen der gestarteten Dienste auf dem LND Server 

nsgios-server:~ # snmpwalk -c -v 1 .1.3.6.1.4.1.334.72.1.1.6.1.2.1.4 \ 

| awk -F"STRING: " '{ print $2 }' | sort | uniq 

… 

"Statistic Collector“ 

"Event Interceptor“ 

"QuerySet Handler“ 

"Cluster Replicator“ 

… 

• Übergabe der Ergebnisse als Argument im command 

nagios-server: # ./check_lotus_notes_services.sh -H \ 

-S “Event Interceptor” \ 

-C 

OK - "Idle: [07/10/2008 13:34:08 CEDT]“ | Counter=1Services 

27

Lotus Notes Domino – Transfer Peak Time 

• Nagios Plugin: 

check_lotus_notes_transfer_per_minute_peak_time * 

#!/bin/bash 

… 

UNIXTIME=`snmpwalk -c -v 1 1.3.6.1.4.1.334.72.1.1.6.3.4.0 \ 

| awk -F"INTEGER: " '{ print $2 }'` 

HUMANTIME=ècho $UNIXTIME | logtime` 

… 

• logtime *: 

Installation in $PATH des users nagios 

Umrechnung UNIX-Timestamp in 

Format YYYY-MM-DD hh:mm:ss 

• Ausgabe im Webfrontend 

28

Lotus Notes Domino – Cluster Index * 

check_lotus_notes_cluster_index.sh \ 

-H 

-C 

-w 

-c 

Domino Cluster 

Node 1, 2, 3, … 

29

LND Cluster Monitoring – OIDs from MIB 


ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.6 Average total running time of cluster transactions. 

ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.7 Average total running time of cluster transactions. 

ClusterTransRunningCount 1.3.6.1.4.1.334.72.1.1.6.4.10.8 Number of cluster transactions. 

ClusterTransRunningTime 1.3.6.1.4.1.334.72.1.1.6.4.10.9 Total running time of cluster transactions. 

ClusterProbeError 1.3.6.1.4.1.334.72.1.1.6.4.11 The number of times a server received an error while 

probing another server. 

… 

30

Exim / Postfix - Methoden 

Mailserver 

check _nrpe 

nrpe 

ssh 

25 

110 

check _by_ssh 

nagios-plugins 

… 

check _tcp 

check … 

31

Exim / Postfix Plugins 

• check_exim_mailq_adv * 

check_exim_mailq_adv -f -w -c 

• check_exim_input ** 

• check_postfix ** 

• check_postfix_queue ** 

32

Munin - Funktionsweise 

Der Munin-Server sammelt Leistungsdaten von im Netzwerk verteilten 

Computern, speichert diese, und stellt die Daten mittels Webinterface 

graphisch dar. Die Speicherung der Messwerte geschieht mit Hilfe von Tobi 

Oetikers RRD Tool. *** 

munin-node 

1. Serverseitig muss der 

CI in der munin.conf 

eingetragen sein 

munin-plugins 

CI 

3. Testen einer Munin-Konfiguration 

munin-server:/var# telnet 192.168.0.105 4949 

Trying 192.168.0.105... 

Connected to 192.168.0.105. 

Escape character is '^]'. 

# munin node at mfe01.itnovum.de 

bla 

# Unknown command. Try list, nodes, config, fetch, version or quit 

list 

memory df cpu exim_mailstats swap exim_mailqueue load 

fetch load 

load.value 1.39 

. 

33 

2. Client-seitig muss der 

Munin-Server in der 

munin-node.conf 

eingetragen sein

Munin Plugins 

• Standard-Plugins im filesystem 

munin-node:/etc/munin/plugins# ls -al 

insgesamt 2 

… 

lrwxrwxrwx 1 root root 28 2006-03-06 20:03 cpu -> /usr/share/munin/plugins/cpu 

lrwxrwxrwx 1 root root 27 2006-03-06 20:03 df -> /usr/share/munin/plugins/df 

… 

lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailqueue -> /usr/share/munin/plugins/exim_mailqueue 

lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailstats -> /usr/share/munin/plugins/exim_mailstats 

… 

lrwxrwxrwx 1 root root 43 2006-03-06 20:03 postfix_mailvolume -> /usr/share/munin/plugins/postfix_mailvolume 

… 

• Im www 

http://muninexchange.projects.linpro.no/ 

34

Munin – Schwellwerte 

Definition von Schwellwerten in dem ensprechenden munin-plugin 

munin-node:/etc/munin/plugins# grep -E 'QUEUE.*=.*0' exim_mailqueue 

QUEUEWARN=100 

QUEUECRIT=200 

Darstellung im Webfrontend 

Stati – Nagios like 

OK || Warning || Critical 

35

Munin – Nagios Schnittstelle 

Server 

Mailserver 

nsca 

munin-node 

munin-plugins 

CI 

36

Munin – Nagios Schnittstellenkonfig 1 

• nagios.cfg Nagios-Server 

… 

check_external_commands=1 

… 

• Installation 

• send_nsca (Munin-Server) 

• nsca (Nagios Server) 

• send_nsca.cfg Munin-Server 

… 

password=secret 

encryption_method=1 

… 

• nsca.cfg am Nagios-Server 

… 

password=secret 

encryption_method=1 

… 

37


• munin.conf anpassen (Munin-Server) 

# For those with Nagios, the following might come in handy. In addition, 

# the services must be defined in the Nagios server as well. 

contact.nagios.command /usr/sbin/send_nsca -H nagios-server -c /etc/send_nsca.cfg 

• Auslesen des graph title aus dem munin-plugin (Munin-Server) 

#!/bin/bash 

… 

GRAPHTITLE='Exim Mailqueue' 

echo "graph_title $GRAPHTITLE“ 

… 

• Definition des Services als Passive Service (Nagios-Server) 

define service{ 

use 

host_name 

service_description 

} 

passive-service 

mgmt05.itnovum.de 

Exim Mailqueue 

38


• Valedierung der Konfiguration 

Munin-Server 

munin-server:~# printf "%s\t%s\t%s\t%s\n" "mgmt05.itnovum.de" "Exim Mailqueue" "0" "ALLES OK" \ 

| /usr/sbin/send_nsca -H -c /etc/nsca.cfg 

1 data packet(s) sent to host successfully. 

Nagios-Server 

nagios-server:# tail -f nagios.log | logtime 

[2008-09-06 18:47:34] PASSIVE SERVICE CHECK: mgmt05.itnovum.de;Exim Mailqueue;0;ALLES OK 

• Nagios-Frontend 

39

Quellen 

* www.itnovum.de 

** www.nagiosexchange.org 

*** www.de.wikipedia.org [Stand 01.09.2008] 

40


Fragen 

41

Vielen Dank für 

Ihre Aufmerksamkeit 

42

Monitoring von Messaging-Systemen via Nagios Markus ... - netways

Create successful ePaper yourself

Delete template?

Save as template?