Monitoring von Messaging-Systemen via Nagios Markus ... - netways
Monitoring von Messaging-Systemen via Nagios Markus ... - netways
Monitoring von Messaging-Systemen via Nagios Markus ... - netways
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />
<strong>Markus</strong> Thiel<br />
Consultant<br />
m.thiel@itnovum.de<br />
it-novum GmbH<br />
Edelzeller Strasse 44<br />
36043 Fulda<br />
www.itnovum.de<br />
1
Inhalte<br />
<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />
Kurze Vorstellung<br />
MS Exchange<br />
<strong>Nagios</strong> Werkzeuge<br />
•Aktive checks<br />
•Passive checks<br />
•End2End - Vorgehensweise<br />
<strong>Monitoring</strong><br />
•MS Exchange<br />
•Lotus Notes Domino<br />
•Exim<br />
•Postfix<br />
Schnittstelle Munin-<strong>Nagios</strong><br />
•nsca / send_nsca<br />
Fragen / Anregungen<br />
2
Itnovum GmbH Geschäftsbereiche<br />
Systemmanagement<br />
Infrastruktur<br />
Optimierung<br />
Enterprise Content<br />
Management<br />
ERP & Business<br />
Intelligence<br />
•Open Source<br />
•ITCOCKPIT / <strong>Nagios</strong><br />
•Storage<br />
Management<br />
•Security<br />
Management<br />
•Dokumenten<br />
Management<br />
•Archivierung<br />
•Enterprise Resource<br />
Planning<br />
•SAP<br />
•Server Virtualisierung<br />
•Client Virtualisierung<br />
•Reporting, Analyse &<br />
Dashboards<br />
•Outsourcing<br />
3
Big picture<br />
IT Prozesse<br />
Business Sicht<br />
Business Service<br />
<strong>Monitoring</strong><br />
Business Service<br />
Dashboard<br />
SLA-<strong>Monitoring</strong><br />
Eventmanagement und Korrelation<br />
BP-<strong>Monitoring</strong><br />
Service Lev.<br />
Management<br />
Incident-<br />
Management<br />
Problem<br />
Management<br />
<strong>Monitoring</strong><br />
CMDB<br />
Technische Sicht<br />
Alarmierung<br />
Schwellwerte<br />
Status<br />
Performancedaten<br />
E2E<br />
<strong>Monitoring</strong><br />
Server Netzwerke Datenbanken Middleware Anwendungen Integration<br />
Change<br />
Management<br />
Release<br />
Management<br />
Capacity<br />
Management<br />
Configuration<br />
Management<br />
4
<strong>Nagios</strong> – Werkzeuge<br />
aktive checks<br />
CI CI<br />
passive checks<br />
Weitere Werkzeuge<br />
Kommerzielle<br />
Tools<br />
Herstellerspezifische<br />
Tools<br />
Open Source<br />
Tools<br />
z.B.<br />
…<br />
5
<strong>Nagios</strong> - Aktive checks<br />
check _nrpe<br />
check _nt<br />
TCP/IP<br />
check _by_ssh<br />
check _snmp<br />
TCP/IP<br />
CI<br />
check _ldap<br />
check_smtp<br />
…<br />
check _tcp<br />
check_tcp -H $HOSTADDRESS$ -p 25 -s HELO ich -e 250 OK -q quit -w $ARG1$-c $ARG2$<br />
Eigene Plugins<br />
6
End2End <strong>Monitoring</strong> 1<br />
7
End2End <strong>Monitoring</strong> 2<br />
Standortabhängig<br />
• check_ldap<br />
• check_tcp<br />
• check_pop3<br />
• …<br />
8
Exchange <strong>Monitoring</strong> - Methoden<br />
Exchange Server<br />
25<br />
check _nt<br />
Prozesse<br />
Perfmon<br />
110<br />
389<br />
…<br />
check _tcp<br />
check …<br />
9
Exchange <strong>Monitoring</strong> – Vorbereitung CI<br />
• Installation nsclient++<br />
• Auslesen der Parameter aus dem Performance Counter<br />
(perfmon)<br />
10
Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />
• Installation check_nt (nagios-plugins)<br />
• Anpassen der <strong>Nagios</strong> Konfiguration<br />
Command-Definition<br />
check_nt -H -s \<br />
-p -v COUNTER \<br />
-l "\\SMTP Server(_Total)\\Gesamtzahl übermittelter Nachrichten"<br />
Übergabe der<br />
Leistungsindikatoren,<br />
Instanzen und<br />
Leistungsobjekte als<br />
Argument<br />
11
Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />
• Option A<br />
Konfiguration ohne Schwellwert<br />
12
Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />
• Option B<br />
Konfiguration mit Schwellwert<br />
wrapper um check_nt bauen; die Schwellwerte werden dabei als<br />
Argument übergeben<br />
#!/bin/bash<br />
…<br />
RETVAL=0<br />
TEMPFILE="/tmp/${HOST}_unzustellbarkeitsberichte"<br />
TIME=`date +%s`<br />
RES=`$PLUGINDIR/check_nt -H -s -p -v COUNTER -l \<br />
"\\SMTP Server(_Total)\\Erzeugte NDRs (Unzustellbarkeitsberichte)"`<br />
if [ -e $TEMPFILE ];then<br />
ALTRES=`cat $TEMPFILE|grep "Wert ="|cut -d '=' -f2`<br />
ALTTIME=`cat $TEMPFILE|grep "Time ="|cut -d '=' -f2`<br />
else<br />
ALTRES=$RES<br />
ALTTIME=$TIME<br />
fi<br />
echo "Wert =${RES}" > $TEMPFILE<br />
echo "Time =${TIME}" >> $TEMPFILE<br />
RES=`echo "${RES}-${ALTRES}"|bc -l`;<br />
TIME=`echo "${TIME}-${ALTTIME}"|bc -l`;<br />
RES=`echo "${RES}/${TIME}*60"|bc -l`;<br />
if [ $RES -ge $WARN ]; then<br />
RETVAL=1<br />
fi<br />
if [ $RES -ge $CRIT ]; then<br />
RETVAL=2<br />
fi<br />
RETSTR="Erzeugte NDR pro Min : ${RES}|NDRs=${RES}NDRs_per_min;$WARN;$CRIT„<br />
echo $RETSTR<br />
exit $RETVAL<br />
13
Exchange <strong>Monitoring</strong> – <strong>Nagios</strong> Konfig<br />
• Option B<br />
Konfiguration mit Schwellwert<br />
Pluginausgabe:<br />
14
Exchange <strong>Monitoring</strong> – weitere Parameter<br />
Erzeugte NDR<br />
Nicht zustallbare Nachrichten<br />
Angemeldete Benutzer<br />
Warteschlangen<br />
…<br />
Shot: MS Exchange 2007<br />
15
Exchange <strong>Monitoring</strong> – Prozessabfrage<br />
• Prozess-Status auslesen<br />
check_nt -H -s \<br />
-p -v PROCSTATE\<br />
-l STORE.EXE<br />
• Auszug relevanter Prozesse<br />
MSExchange-Informationsspeicher<br />
store.exe<br />
MSExchange-Systemaufsicht<br />
mad.exe<br />
MSExchange-Verwaltung<br />
exmgmt.exe<br />
MSExchangeRoutingModul<br />
inetinfo.exe<br />
MSExchangeMTA-Stacks<br />
emsmta.exe<br />
16
Exchange <strong>Monitoring</strong> – NagVis<br />
17
Lotus Notes Domino - Methoden<br />
QuerySet Handler<br />
fragt die Statistikinformationen des Server ab<br />
und gibt diese an den LNSNMP ab, der diese<br />
Informationen an den plattformspezifischen<br />
SNMP Agent übergibt<br />
LNSNMP<br />
QuerySet Handler<br />
Event Interceptor<br />
Domino-Server<br />
Event Interceptor<br />
weist den LNSNMP an, zB einen snmp-trap<br />
abzusetzen<br />
18
LNSNMP<br />
• Unterstützte Plattformen<br />
z/OS<br />
(OS 390)<br />
• Nicht unterstützte Plattformen<br />
zSeries<br />
19
Lotus Notes Domino – Vorbereitung CI<br />
• Installation / Konfiguration SNMP<br />
• Installation LNSNMP<br />
20
LND <strong>Monitoring</strong> – Methode 1 (passiv)<br />
Domino-Server<br />
LNSNMP<br />
Event Interceptor<br />
snmptrap in Echtzeit<br />
21
LND <strong>Monitoring</strong> – Methode 2 (aktiv)<br />
Domino-Server<br />
LNSNMP<br />
QuerySet Handler<br />
1152<br />
25<br />
389<br />
…<br />
check _snmp<br />
check _tcp<br />
check …<br />
22
LND <strong>Monitoring</strong> – passive vs aktive checks<br />
Anforderung Passive check Active check<br />
Snapshot View<br />
Konfigurationsaufwand<br />
Systemübergreifende Eventkorrelation<br />
Klare Servicezuordnung<br />
SLA tauglich<br />
BPM tauglich<br />
Differenzierung der Events beim Ausfall mehrerer Komponenten<br />
Tiefergehende Applikationsüberwachung<br />
Performancedaten / Langzeitanalyse<br />
23
LND <strong>Monitoring</strong> – Standard check_snmp<br />
• Installation check_snmp (nagios-plugins)<br />
• Anpassen der <strong>Nagios</strong> Konfiguration<br />
Command-Definition<br />
check_snmp \<br />
–H \<br />
-C \<br />
-o 1.3.6.1.4.1.334.72.1.1.4.3.0 \<br />
-l LN_TOTAL_MAIL_FAILURES \<br />
-w \<br />
-c \<br />
-u Mails<br />
Pluginoutput<br />
LN_TOTAL_MAIL_FAILURES OK - 1 Mails | iso.3.6.1.4.1.334.72.1.1.4.3.0=1<br />
24
LND <strong>Monitoring</strong> – OIDs from MIB<br />
Service OID Description from MIB<br />
dead-mail enterprises.334.72.1.1.4.1.0 Number of dead (undeliverable) mail messages<br />
routing-failures enterprises.334.72.1.1.4.3.0 Total number of routing failures since the server started<br />
pending-routing enterprises.334.72.1.1.4.6.0 Number of mail messages waiting to be routed<br />
pending-local enterprises.334.72.1.1.4.7.0 Number of pending mail messages awaiting local delivery<br />
max-mail-delivery-time enterprises.334.72.1.1.4.12.0 Maximum time for mail delivery in seconds<br />
router-unable-to-transfer enterprises.334.72.1.1.4.19.0 Number of mail messages the router was unable to transfer<br />
mail-held-in-queue enterprises.334.72.1.1.4.21.0 Number of mail messages in message queue on hold<br />
mails-pending enterprises.334.72.1.1.4.31.0 Number of mail messages pending<br />
replicator-status enterprises.334.72.1.1.6.1.3.0 Status of the Replicator task<br />
router-status enterprises.334.72.1.1.6.1.4.0 Status of the Router task<br />
databases-in-cache enterprises.334.72.1.1.10.15.0 The number of databases currently in the cache. Administrators should<br />
monitor this number to see whether it approaches the<br />
NSF_DBCACHE_MAXENTRIES setting. If it does, this indicates the cache is<br />
under pressure. If this situation occurs frequently, the administrator should<br />
increase the setting for NSF_DBCACHE_MAXENTRIES<br />
25
LND <strong>Monitoring</strong> – OIDs from MIB 2<br />
Service OID Description from MIB<br />
messages-send enterprises.334.72.1.1.4.2.0 Number of messges received by router<br />
messages-routed enterprises.334.72.1.1.4.4.0 Total number of mail messages routed since the server started<br />
router-messages-attempted-totransfer<br />
enterprises.334.72.1.1.4.5.0<br />
Number of messages router attempted to transfer<br />
delivered-mail-size-avg enterprises.334.72.1.1.4.11.0 Average size of mail messages delivered in bytes<br />
delivered-mail-size-max enterprises.334.72.1.1.4.14.0 Maximum size of mail delivered in bytes<br />
total-mail-transferred enterprises.334.72.1.1.4.18.0 Total mail transferred in kilobytes<br />
transferred-per-min-peak enterprises.334.72.1.1.4.27.0 Peak number of messages transferred<br />
…<br />
MemAllocProcess enterprises.334.72.1.1.9.2 Total process-private memory allocated by all currently-running<br />
processes.<br />
DriveFree enterprises.334.72.1.1.8.3.1.4 The amount of free space left on this drive in kilobytes.<br />
A value of zero may indicate the statistic's value is<br />
too large to be passed <strong>via</strong> SNMP.<br />
26
Lotus Notes Domino – Dienste checken<br />
• Installation check_lotus_notes_services plugin *<br />
• Auslesen der gestarteten Dienste auf dem LND Server<br />
nsgios-server:~ # snmpwalk -c -v 1 .1.3.6.1.4.1.334.72.1.1.6.1.2.1.4 \<br />
| awk -F"STRING: " '{ print $2 }' | sort | uniq<br />
…<br />
"Statistic Collector“<br />
"Event Interceptor“<br />
"QuerySet Handler“<br />
"Cluster Replicator“<br />
…<br />
• Übergabe der Ergebnisse als Argument im command<br />
nagios-server: # ./check_lotus_notes_services.sh -H \<br />
-S “Event Interceptor” \<br />
-C <br />
OK - "Idle: [07/10/2008 13:34:08 CEDT]“ | Counter=1Services<br />
27
Lotus Notes Domino – Transfer Peak Time<br />
• <strong>Nagios</strong> Plugin:<br />
check_lotus_notes_transfer_per_minute_peak_time *<br />
#!/bin/bash<br />
…<br />
UNIXTIME=`snmpwalk -c -v 1 1.3.6.1.4.1.334.72.1.1.6.3.4.0 \<br />
| awk -F"INTEGER: " '{ print $2 }'`<br />
HUMANTIME=`echo $UNIXTIME | logtime`<br />
…<br />
• logtime *:<br />
Installation in $PATH des users nagios<br />
Umrechnung UNIX-Timestamp in<br />
Format YYYY-MM-DD hh:mm:ss<br />
• Ausgabe im Webfrontend<br />
28
Lotus Notes Domino – Cluster Index *<br />
check_lotus_notes_cluster_index.sh \<br />
-H <br />
-C <br />
-w <br />
-c <br />
Domino Cluster<br />
Node 1, 2, 3, …<br />
29
LND Cluster <strong>Monitoring</strong> – OIDs from MIB<br />
Service OID Description from MIB<br />
ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.6 Average total running time of cluster transactions.<br />
ClusterTransRunningAvgTime 1.3.6.1.4.1.334.72.1.1.6.4.10.7 Average total running time of cluster transactions.<br />
ClusterTransRunningCount 1.3.6.1.4.1.334.72.1.1.6.4.10.8 Number of cluster transactions.<br />
ClusterTransRunningTime 1.3.6.1.4.1.334.72.1.1.6.4.10.9 Total running time of cluster transactions.<br />
ClusterProbeError 1.3.6.1.4.1.334.72.1.1.6.4.11 The number of times a server received an error while<br />
probing another server.<br />
…<br />
30
Exim / Postfix - Methoden<br />
Mailserver<br />
check _nrpe<br />
nrpe<br />
ssh<br />
25<br />
110<br />
check _by_ssh<br />
nagios-plugins<br />
…<br />
check _tcp<br />
check …<br />
31
Exim / Postfix Plugins<br />
• check_exim_mailq_adv *<br />
check_exim_mailq_adv -f -w -c <br />
• check_exim_input **<br />
• check_postfix **<br />
• check_postfix_queue **<br />
32
Munin - Funktionsweise<br />
Der Munin-Server sammelt Leistungsdaten <strong>von</strong> im Netzwerk verteilten<br />
Computern, speichert diese, und stellt die Daten mittels Webinterface<br />
graphisch dar. Die Speicherung der Messwerte geschieht mit Hilfe <strong>von</strong> Tobi<br />
Oetikers RRD Tool. ***<br />
munin-node<br />
1. Serverseitig muss der<br />
CI in der munin.conf<br />
eingetragen sein<br />
munin-plugins<br />
CI<br />
3. Testen einer Munin-Konfiguration<br />
munin-server:/var# telnet 192.168.0.105 4949<br />
Trying 192.168.0.105...<br />
Connected to 192.168.0.105.<br />
Escape character is '^]'.<br />
# munin node at mfe01.itnovum.de<br />
bla<br />
# Unknown command. Try list, nodes, config, fetch, version or quit<br />
list<br />
memory df cpu exim_mailstats swap exim_mailqueue load<br />
fetch load<br />
load.value 1.39<br />
.<br />
33<br />
2. Client-seitig muss der<br />
Munin-Server in der<br />
munin-node.conf<br />
eingetragen sein
Munin Plugins<br />
• Standard-Plugins im filesystem<br />
munin-node:/etc/munin/plugins# ls -al<br />
insgesamt 2<br />
…<br />
lrwxrwxrwx 1 root root 28 2006-03-06 20:03 cpu -> /usr/share/munin/plugins/cpu<br />
lrwxrwxrwx 1 root root 27 2006-03-06 20:03 df -> /usr/share/munin/plugins/df<br />
…<br />
lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailqueue -> /usr/share/munin/plugins/exim_mailqueue<br />
lrwxrwxrwx 1 root root 39 2006-03-06 20:03 exim_mailstats -> /usr/share/munin/plugins/exim_mailstats<br />
…<br />
lrwxrwxrwx 1 root root 43 2006-03-06 20:03 postfix_mailvolume -> /usr/share/munin/plugins/postfix_mailvolume<br />
…<br />
• Im www<br />
http://muninexchange.projects.linpro.no/<br />
34
Munin – Schwellwerte<br />
Definition <strong>von</strong> Schwellwerten in dem ensprechenden munin-plugin<br />
munin-node:/etc/munin/plugins# grep -E 'QUEUE.*=.*0' exim_mailqueue<br />
QUEUEWARN=100<br />
QUEUECRIT=200<br />
Darstellung im Webfrontend<br />
Stati – <strong>Nagios</strong> like<br />
OK || Warning || Critical<br />
35
Munin – <strong>Nagios</strong> Schnittstelle<br />
Server<br />
Mailserver<br />
nsca<br />
munin-node<br />
munin-plugins<br />
CI<br />
36
Munin – <strong>Nagios</strong> Schnittstellenkonfig 1<br />
• nagios.cfg <strong>Nagios</strong>-Server<br />
…<br />
check_external_commands=1<br />
…<br />
• Installation<br />
• send_nsca (Munin-Server)<br />
• nsca (<strong>Nagios</strong> Server)<br />
• send_nsca.cfg Munin-Server<br />
…<br />
password=secret<br />
encryption_method=1<br />
…<br />
• nsca.cfg am <strong>Nagios</strong>-Server<br />
…<br />
password=secret<br />
encryption_method=1<br />
…<br />
37
Munin – <strong>Nagios</strong> Schnittstellenkonfig 2<br />
• munin.conf anpassen (Munin-Server)<br />
# For those with <strong>Nagios</strong>, the following might come in handy. In addition,<br />
# the services must be defined in the <strong>Nagios</strong> server as well.<br />
contact.nagios.command /usr/sbin/send_nsca -H nagios-server -c /etc/send_nsca.cfg<br />
• Auslesen des graph title aus dem munin-plugin (Munin-Server)<br />
#!/bin/bash<br />
…<br />
GRAPHTITLE='Exim Mailqueue'<br />
echo "graph_title $GRAPHTITLE“<br />
…<br />
• Definition des Services als Passive Service (<strong>Nagios</strong>-Server)<br />
define service{<br />
use<br />
host_name<br />
service_description<br />
}<br />
passive-service<br />
mgmt05.itnovum.de<br />
Exim Mailqueue<br />
38
Munin – <strong>Nagios</strong> Schnittstellenkonfig 3<br />
• Valedierung der Konfiguration<br />
Munin-Server<br />
munin-server:~# printf "%s\t%s\t%s\t%s\n" "mgmt05.itnovum.de" "Exim Mailqueue" "0" "ALLES OK" \<br />
| /usr/sbin/send_nsca -H -c /etc/nsca.cfg<br />
1 data packet(s) sent to host successfully.<br />
<strong>Nagios</strong>-Server<br />
nagios-server:# tail -f nagios.log | logtime<br />
[2008-09-06 18:47:34] PASSIVE SERVICE CHECK: mgmt05.itnovum.de;Exim Mailqueue;0;ALLES OK<br />
• <strong>Nagios</strong>-Frontend<br />
39
Quellen<br />
* www.itnovum.de<br />
** www.nagiosexchange.org<br />
*** www.de.wikipedia.org [Stand 01.09.2008]<br />
40
<strong>Monitoring</strong> <strong>von</strong> <strong>Messaging</strong>-<strong>Systemen</strong> <strong>via</strong> <strong>Nagios</strong><br />
Fragen <br />
41
Vielen Dank für<br />
Ihre Aufmerksamkeit<br />
42