26.01.2015 Views

Monitoring von Messaging-Systemen via Nagios Markus ... - netways

Monitoring von Messaging-Systemen via Nagios Markus ... - netways

Monitoring von Messaging-Systemen via Nagios Markus ... - netways

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />

<strong>Markus</strong> Thiel<br />

Consultant<br />

m.thiel@itnovum.de<br />

it-novum GmbH<br />

Edelzeller Strasse 44<br />

36043 Fulda<br />

www.itnovum.de<br />

1


Inhalte<br />

<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />

Kurze Vorstellung<br />

MS Exchange<br />

<strong>Nagios</strong> Werkzeuge<br />

•Aktive checks<br />

•Passive checks<br />

•End2End - Vorgehensweise<br />

<strong>Monitoring</strong><br />

•MS Exchange<br />

•Lotus Notes Domino<br />

•Exim<br />

•Postfix<br />

Schnittstelle Munin-<strong>Nagios</strong><br />

•nsca / send_nsca<br />

Fragen / Anregungen<br />

2


Itnovum GmbH Geschäftsbereiche<br />

Systemmanagement<br />

Infrastruktur<br />

Optimierung<br />

Enterprise Content<br />

Management<br />

ERP & Business<br />

Intelligence<br />

•Open Source<br />

•ITCOCKPIT / <strong>Nagios</strong><br />

•Storage<br />

Management<br />

•Security<br />

Management<br />

•Dokumenten<br />

Management<br />

•Archivierung<br />

•Enterprise Resource<br />

Planning<br />

•SAP<br />

•Server Virtualisierung<br />

•Client Virtualisierung<br />

•Reporting, Analyse &<br />

Dashboards<br />

•Outsourcing<br />

3


Big picture<br />

IT Prozesse<br />

Business Sicht<br />

Business Service<br />

<strong>Monitoring</strong><br />

Business Service<br />

Dashboard<br />

SLA-<strong>Monitoring</strong><br />

Eventmanagement und Korrelation<br />

BP-<strong>Monitoring</strong><br />

Service Lev.<br />

Management<br />

Incident-<br />

Management<br />

Problem<br />

Management<br />

<strong>Monitoring</strong><br />

CMDB<br />

Technische Sicht<br />

Alarmierung<br />

Schwellwerte<br />

Status<br />

Performancedaten<br />

E2E<br />

<strong>Monitoring</strong><br />

Server Netzwerke Datenbanken Middleware Anwendungen Integration<br />

Change<br />

Management<br />

Release<br />

Management<br />

Capacity<br />

Management<br />

Configuration<br />

Management<br />

4


<strong>Nagios</strong> – Werkzeuge<br />

aktive checks<br />

CI CI<br />

passive checks<br />

Weitere Werkzeuge<br />

Kommerzielle<br />

Tools<br />

Herstellerspezifische<br />

Tools<br />

Open Source<br />

Tools<br />

z.B.<br />

…<br />

5


<strong>Nagios</strong> - Aktive checks<br />

check _nrpe<br />

check _nt<br />

TCP/IP<br />

check _by_ssh<br />

check _snmp<br />

TCP/IP<br />

CI<br />

check _ldap<br />

check_smtp<br />

…<br />

check _tcp<br />

check_tcp -H $HOSTADDRESS$ -p 25 -s HELO ich -e 250 OK -q quit -w $ARG1$-c $ARG2$<br />

Eigene Plugins<br />

6


End2End <strong>Monitoring</strong> 1<br />

7


End2End <strong>Monitoring</strong> 2<br />

Standortabhängig<br />

• check_ldap<br />

• check_tcp<br />

• check_pop3<br />

• …<br />

8


Exchange <strong>Monitoring</strong> - Methoden<br />

Exchange Server<br />

25<br />

check _nt<br />

Prozesse<br />

Perfmon<br />

110<br />

389<br />

…<br />

check _tcp<br />

check …<br />

9


Exchange <strong>Monitoring</strong> – Vorbereitung CI<br />

• Installation nsclient++<br />

• Auslesen der Parameter aus dem Performance Counter<br />

(perfmon)<br />

10


Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />

• Installation check_nt (nagios-plugins)<br />

• Anpassen der <strong>Nagios</strong> Konfiguration<br />

Command-Definition<br />

check_nt -H -s \<br />

-p -v COUNTER \<br />

-l "\\SMTP Server(_Total)\\Gesamtzahl übermittelter Nachrichten"<br />

Übergabe der<br />

Leistungsindikatoren,<br />

Instanzen und<br />

Leistungsobjekte als<br />

Argument<br />

11


Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />

• Option A<br />

Konfiguration ohne Schwellwert<br />

12


Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />

• Option B<br />

Konfiguration mit Schwellwert<br />

wrapper um check_nt bauen; die Schwellwerte werden dabei als<br />

Argument übergeben<br />

#!/bin/bash<br />

…<br />

RETVAL=0<br />

TEMPFILE="/tmp/${HOST}_unzustellbarkeitsberichte"<br />

TIME=`date +%s`<br />

RES=`$PLUGINDIR/check_nt -H -s -p -v COUNTER -l \<br />

"\\SMTP Server(_Total)\\Erzeugte NDRs (Unzustellbarkeitsberichte)"`<br />

if [ -e $TEMPFILE ];then<br />

ALTRES=`cat $TEMPFILE|grep "Wert ="|cut -d '=' -f2`<br />

ALTTIME=`cat $TEMPFILE|grep "Time ="|cut -d '=' -f2`<br />

else<br />

ALTRES=$RES<br />

ALTTIME=$TIME<br />

fi<br />

echo "Wert =${RES}" > $TEMPFILE<br />

echo "Time =${TIME}" >> $TEMPFILE<br />

RES=`echo "${RES}-${ALTRES}"|bc -l`;<br />

TIME=`echo "${TIME}-${ALTTIME}"|bc -l`;<br />

RES=`echo "${RES}/${TIME}*60"|bc -l`;<br />

if [ $RES -ge $WARN ]; then<br />

RETVAL=1<br />

fi<br />

if [ $RES -ge $CRIT ]; then<br />

RETVAL=2<br />

fi<br />

RETSTR="Erzeugte NDR pro Min : ${RES}|NDRs=${RES}NDRs_per_min;$WARN;$CRIT„<br />

echo $RETSTR<br />

exit $RETVAL<br />

13


Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />

• Option B<br />

Konfiguration mit Schwellwert<br />

Pluginausgabe:<br />

14


Exchange <strong>Monitoring</strong> – weitere Parameter<br />

Erzeugte NDR<br />

Nicht zustallbare Nachrichten<br />

Angemeldete Benutzer<br />

Warteschlangen<br />

…<br />

Shot: MS Exchange 2007<br />

15


Exchange <strong>Monitoring</strong> – Prozessabfrage<br />

• Prozess-Status auslesen<br />

check_nt -H -s \<br />

-p -v PROCSTATE\<br />

-l STORE.EXE<br />

• Auszug relevanter Prozesse<br />

MSExchange-Informationsspeicher<br />

store.exe<br />

MSExchange-Systemaufsicht<br />

mad.exe<br />

MSExchange-Verwaltung<br />

exmgmt.exe<br />

MSExchangeRoutingModul<br />

inetinfo.exe<br />

MSExchangeMTA-Stacks<br />

emsmta.exe<br />

16


Exchange <strong>Monitoring</strong> – NagVis<br />

17


Lotus Notes Domino - Methoden<br />

QuerySet Handler<br />

fragt die Statistikinformationen des Server ab<br />

und gibt diese an den LNSNMP ab, der diese<br />

Informationen an den plattformspezifischen<br />

SNMP Agent übergibt<br />

LNSNMP<br />

QuerySet Handler<br />

Event Interceptor<br />

Domino-Server<br />

Event Interceptor<br />

weist den LNSNMP an, zB einen snmp-trap<br />

abzusetzen<br />

18


LNSNMP<br />

• Unterstützte Plattformen<br />

z/OS<br />

(OS 390)<br />

• Nicht unterstützte Plattformen<br />

zSeries<br />

19


Lotus Notes Domino – Vorbereitung CI<br />

• Installation / Konfiguration SNMP<br />

• Installation LNSNMP<br />

20


LND <strong>Monitoring</strong> – Methode 1 (passiv)<br />

Domino-Server<br />

LNSNMP<br />

Event Interceptor<br />

snmptrap in Echtzeit<br />

21


LND <strong>Monitoring</strong> – Methode 2 (aktiv)<br />

Domino-Server<br />

LNSNMP<br />

QuerySet Handler<br />

1152<br />

25<br />

389<br />

…<br />

check _snmp<br />

check _tcp<br />

check …<br />

22


LND <strong>Monitoring</strong> – passive vs aktive checks<br />

Anforderung Passive check Active check<br />

Snapshot View<br />

Konfigurationsaufwand<br />

Systemübergreifende Eventkorrelation<br />

Klare Servicezuordnung<br />

SLA tauglich<br />

BPM tauglich<br />

Differenzierung der Events beim Ausfall mehrerer Komponenten<br />

Tiefergehende Applikationsüberwachung<br />

Performancedaten / Langzeitanalyse<br />

23


LND <strong>Monitoring</strong> – Standard check_snmp<br />

• Installation check_snmp (nagios-plugins)<br />

• Anpassen der <strong>Nagios</strong> Konfiguration<br />

Command-Definition<br />

check_snmp \<br />

–H \<br />

-C \<br />

-o 1.3.6.1.4.1.334.72.1.1.4.3.0 \<br />

-l LN_TOTAL_MAIL_FAILURES \<br />

-w \<br />

-c \<br />

-u Mails<br />

Pluginoutput<br />

LN_TOTAL_MAIL_FAILURES OK - 1 Mails | iso.3.6.1.4.1.334.72.1.1.4.3.0=1<br />

24


LND <strong>Monitoring</strong> – OIDs from MIB<br />

Service OID Description from MIB<br />

dead-mail enterprises.334.72.1.1.4.1.0 Number of dead (undeliverable) mail messages<br />

routing-failures enterprises.334.72.1.1.4.3.0 Total number of routing failures since the server started<br />

pending-routing enterprises.334.72.1.1.4.6.0 Number of mail messages waiting to be routed<br />

pending-local enterprises.334.72.1.1.4.7.0 Number of pending mail messages awaiting local delivery<br />

max-mail-delivery-time enterprises.334.72.1.1.4.12.0 Maximum time for mail delivery in seconds<br />

router-unable-to-transfer enterprises.334.72.1.1.4.19.0 Number of mail messages the router was unable to transfer<br />

mail-held-in-queue enterprises.334.72.1.1.4.21.0 Number of mail messages in message queue on hold<br />

mails-pending enterprises.334.72.1.1.4.31.0 Number of mail messages pending<br />

replicator-status enterprises.334.72.1.1.6.1.3.0 Status of the Replicator task<br />

router-status enterprises.334.72.1.1.6.1.4.0 Status of the Router task<br />

databases-in-cache enterprises.334.72.1.1.10.15.0 The number of databases currently in the cache. Administrators should<br />

monitor this number to see whether it approaches the<br />

NSF_DBCACHE_MAXENTRIES setting. If it does, this indicates the cache is<br />

under pressure. If this situation occurs frequently, the administrator should<br />

increase the setting for NSF_DBCACHE_MAXENTRIES<br />

25


LND <strong>Monitoring</strong> – OIDs from MIB 2<br />

Service OID Description from MIB<br />

messages-send enterprises.334.72.1.1.4.2.0 Number of messges received by router<br />

messages-routed enterprises.334.72.1.1.4.4.0 Total number of mail messages routed since the server started<br />

router-messages-attempted-totransfer<br />

enterprises.334.72.1.1.4.5.0<br />

Number of messages router attempted to transfer<br />

delivered-mail-size-avg enterprises.334.72.1.1.4.11.0 Average size of mail messages delivered in bytes<br />

delivered-mail-size-max enterprises.334.72.1.1.4.14.0 Maximum size of mail delivered in bytes<br />

total-mail-transferred enterprises.334.72.1.1.4.18.0 Total mail transferred in kilobytes<br />

transferred-per-min-peak enterprises.334.72.1.1.4.27.0 Peak number of messages transferred<br />

…<br />

MemAllocProcess enterprises.334.72.1.1.9.2 Total process-private memory allocated by all currently-running<br />

processes.<br />

DriveFree enterprises.334.72.1.1.8.3.1.4 The amount of free space left on this drive in kilobytes.<br />

A value of zero may indicate the statistic's value is<br />

too large to be passed <strong>via</strong> SNMP.<br />

26


Lotus Notes Domino – Dienste checken<br />

• Installation check_lotus_notes_services plugin *<br />

• Auslesen der gestarteten Dienste auf dem LND Server<br />

nsgios-server:~ # snmpwalk -c -v 1 .1.3.6.1.4.1.334.72.1.1.6.1.2.1.4 \<br />

| awk -F"STRING: " '{ print $2 }' | sort | uniq<br />

…<br />

"Statistic Collector“<br />

"Event Interceptor“<br />

"QuerySet Handler“<br />

"Cluster Replicator“<br />

…<br />

• Übergabe der Ergebnisse als Argument im command<br />

nagios-server: # ./check_lotus_notes_services.sh -H \<br />

-S “Event Interceptor” \<br />

-C <br />

OK - "Idle: [07/10/2008 13:34:08 CEDT]“ | Counter=1Services<br />

27


Lotus Notes Domino – Transfer Peak Time<br />

• <strong>Nagios</strong> Plugin:<br />

check_lotus_notes_transfer_per_minute_peak_time *<br />

#!/bin/bash<br />

…<br />

UNIXTIME=`snmpwalk -c -v 1 1.3.6.1.4.1.334.72.1.1.6.3.4.0 \<br />

| awk -F"INTEGER: " '{ print $2 }'`<br />

HUMANTIME=`echo $UNIXTIME | logtime`<br />

…<br />

• logtime *:<br />

Installation in $PATH des users nagios<br />

Umrechnung UNIX-Timestamp in<br />

Format YYYY-MM-DD hh:mm:ss<br />

• Ausgabe im Webfrontend<br />

28


Lotus Notes Domino – Cluster Index *<br />

check_lotus_notes_cluster_index.sh \<br />

-H <br />

-C <br />

-w <br />

-c <br />

Domino Cluster<br />

Node 1, 2, 3, …<br />

29


LND Cluster <strong>Monitoring</strong> – OIDs from MIB<br />

Service OID Description from MIB<br />

ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.6 Average total running time of cluster transactions.<br />

ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.7 Average total running time of cluster transactions.<br />

ClusterTransRunningCount 1.3.6.1.4.1.334.72.1.1.6.4.10.8 Number of cluster transactions.<br />

ClusterTransRunningTime 1.3.6.1.4.1.334.72.1.1.6.4.10.9 Total running time of cluster transactions.<br />

ClusterProbeError 1.3.6.1.4.1.334.72.1.1.6.4.11 The number of times a server received an error while<br />

probing another server.<br />

…<br />

30


Exim / Postfix - Methoden<br />

Mailserver<br />

check _nrpe<br />

nrpe<br />

ssh<br />

25<br />

110<br />

check _by_ssh<br />

nagios-plugins<br />

…<br />

check _tcp<br />

check …<br />

31


Exim / Postfix Plugins<br />

• check_exim_mailq_adv *<br />

check_exim_mailq_adv -f -w -c <br />

• check_exim_input **<br />

• check_postfix **<br />

• check_postfix_queue **<br />

32


Munin - Funktionsweise<br />

Der Munin-Server sammelt Leistungsdaten <strong>von</strong> im Netzwerk verteilten<br />

Computern, speichert diese, und stellt die Daten mittels Webinterface<br />

graphisch dar. Die Speicherung der Messwerte geschieht mit Hilfe <strong>von</strong> Tobi<br />

Oetikers RRD Tool. ***<br />

munin-node<br />

1. Serverseitig muss der<br />

CI in der munin.conf<br />

eingetragen sein<br />

munin-plugins<br />

CI<br />

3. Testen einer Munin-Konfiguration<br />

munin-server:/var# telnet 192.168.0.105 4949<br />

Trying 192.168.0.105...<br />

Connected to 192.168.0.105.<br />

Escape character is '^]'.<br />

# munin node at mfe01.itnovum.de<br />

bla<br />

# Unknown command. Try list, nodes, config, fetch, version or quit<br />

list<br />

memory df cpu exim_mailstats swap exim_mailqueue load<br />

fetch load<br />

load.value 1.39<br />

.<br />

33<br />

2. Client-seitig muss der<br />

Munin-Server in der<br />

munin-node.conf<br />

eingetragen sein


Munin Plugins<br />

• Standard-Plugins im filesystem<br />

munin-node:/etc/munin/plugins# ls -al<br />

insgesamt 2<br />

…<br />

lrwxrwxrwx 1 root root 28 2006-03-06 20:03 cpu -> /usr/share/munin/plugins/cpu<br />

lrwxrwxrwx 1 root root 27 2006-03-06 20:03 df -> /usr/share/munin/plugins/df<br />

…<br />

lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailqueue -> /usr/share/munin/plugins/exim_mailqueue<br />

lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailstats -> /usr/share/munin/plugins/exim_mailstats<br />

…<br />

lrwxrwxrwx 1 root root 43 2006-03-06 20:03 postfix_mailvolume -> /usr/share/munin/plugins/postfix_mailvolume<br />

…<br />

• Im www<br />

http://muninexchange.projects.linpro.no/<br />

34


Munin – Schwellwerte<br />

Definition <strong>von</strong> Schwellwerten in dem ensprechenden munin-plugin<br />

munin-node:/etc/munin/plugins# grep -E 'QUEUE.*=.*0' exim_mailqueue<br />

QUEUEWARN=100<br />

QUEUECRIT=200<br />

Darstellung im Webfrontend<br />

Stati – <strong>Nagios</strong> like<br />

OK || Warning || Critical<br />

35


Munin – <strong>Nagios</strong> Schnittstelle<br />

Server<br />

Mailserver<br />

nsca<br />

munin-node<br />

munin-plugins<br />

CI<br />

36


Munin – <strong>Nagios</strong> Schnittstellenkonfig 1<br />

• nagios.cfg <strong>Nagios</strong>-Server<br />

…<br />

check_external_commands=1<br />

…<br />

• Installation<br />

• send_nsca (Munin-Server)<br />

• nsca (<strong>Nagios</strong> Server)<br />

• send_nsca.cfg Munin-Server<br />

…<br />

password=secret<br />

encryption_method=1<br />

…<br />

• nsca.cfg am <strong>Nagios</strong>-Server<br />

…<br />

password=secret<br />

encryption_method=1<br />

…<br />

37


Munin – <strong>Nagios</strong> Schnittstellenkonfig 2<br />

• munin.conf anpassen (Munin-Server)<br />

# For those with <strong>Nagios</strong>, the following might come in handy. In addition,<br />

# the services must be defined in the <strong>Nagios</strong> server as well.<br />

contact.nagios.command /usr/sbin/send_nsca -H nagios-server -c /etc/send_nsca.cfg<br />

• Auslesen des graph title aus dem munin-plugin (Munin-Server)<br />

#!/bin/bash<br />

…<br />

GRAPHTITLE='Exim Mailqueue'<br />

echo "graph_title $GRAPHTITLE“<br />

…<br />

• Definition des Services als Passive Service (<strong>Nagios</strong>-Server)<br />

define service{<br />

use<br />

host_name<br />

service_description<br />

}<br />

passive-service<br />

mgmt05.itnovum.de<br />

Exim Mailqueue<br />

38


Munin – <strong>Nagios</strong> Schnittstellenkonfig 3<br />

• Valedierung der Konfiguration<br />

Munin-Server<br />

munin-server:~# printf "%s\t%s\t%s\t%s\n" "mgmt05.itnovum.de" "Exim Mailqueue" "0" "ALLES OK" \<br />

| /usr/sbin/send_nsca -H -c /etc/nsca.cfg<br />

1 data packet(s) sent to host successfully.<br />

<strong>Nagios</strong>-Server<br />

nagios-server:# tail -f nagios.log | logtime<br />

[2008-09-06 18:47:34] PASSIVE SERVICE CHECK: mgmt05.itnovum.de;Exim Mailqueue;0;ALLES OK<br />

• <strong>Nagios</strong>-Frontend<br />

39


Quellen<br />

* www.itnovum.de<br />

** www.nagiosexchange.org<br />

*** www.de.wikipedia.org [Stand 01.09.2008]<br />

40


<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />

Fragen <br />

41


Vielen Dank für<br />

Ihre Aufmerksamkeit<br />

42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!