13.07.2015 Views

New Host Check Logic - netways

New Host Check Logic - netways

New Host Check Logic - netways

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

3.0 Features In Depth


Object DefinitionsMultiple template names:Names separated by commasAllows for more advanced inheritance of objectpropertiesEasier configuration management for complexenvironmentsNETWAYS Nagios Konferenz 20065Ethan Galstad


Multiple Template NamesMultiple inheritance sources...# Generic host templatedefine host{namegeneric-hostactive_checks_enabled 1check_interval 10...register 0}# Development web server templatedefine host{namedevelopment-servercheck_interval 15notification_options d,u,r...register 0}# Development web serverdefine host{usehost_name...}NETWAYS Nagios Konferenz 2006generic-host,development-serverdevweb16Ethan Galstad


Multiple Template NamesComplex inheritance abilities...# Development web serverdefine host{use 1, 4, 8host_name devweb1...}NETWAYS Nagios Konferenz 20067Ethan Galstad


Object DefinitionsSuppression of inherited object vars:Character variables in templates (e.g. event_handler)couldn't be cleared in objects using them – until now!Use “null” as keyword to clear value# Generic host templatedefine host{nameevent_handler...register 0}generic-hosthandle-host-event# Development web serverdefine host{host_nameevent_handler...}devweb1nullNETWAYS Nagios Konferenz 20068Ethan Galstad


Object DefinitionsExtended info definitions:<strong>Host</strong>extinfo and Serviceextinfo object types are goneExtended info now stored in host and service definitionsExisting definitions are still processed by Nagios andautomatically merged with host/service definitions# Dev server HTTPdefine service{host_name devweb1description HTTPicon_image iis40.pngicon_image_alt IIS 5notesThis is a web servernotes_urlhttp://someurlaction_url http://someurl...}NETWAYS Nagios Konferenz 20069Ethan Galstad


Subgroup references:Object Definitions<strong>Host</strong>, service, and contact groups can now referenceother groups for membershipReferencing Groups# All Windows serversdefine hostgroup{hostgroup_namehostgroup_namesmembers}windows-serversweb-servers,file-serverspdc,bdc,!fs1Referencing Individual <strong>Host</strong>s# All Windows serversdefine hostgroup{hostgroup_namemembers}windows-serverspdc,bdc,a,b,c,x,y,z# Windows web serversdefine hostgroup{hostgroup_namemembers}web-serversa,b,c# Windows web serversdefine hostgroup{hostgroup_namemembers}web-serversa,b,c# Windows file serversdefine hostgroup{hostgroup_namemembers}file-serversx,y,z,fs1# Windows file serversdefine hostgroup{hostgroup_namemembers}file-serversx,y,z,fs1NETWAYS Nagios Konferenz 200610Ethan Galstad


Contacts:Object DefinitionsNotifications for hosts, services, and escalations can nowbe configured for individual contacts, rather than groupsdefine host{host_namecontacts...}define host{host_namecontactgroups...}define host{host_namecontactgroupscontacts...}devweb1paul,sheiladevweb2web-developersdevweb3web-developers!paul,gunter,shielaNETWAYS Nagios Konferenz 200611Ethan Galstad


First notification delay:NotificationsDelay 1 st notification until problem persists for x minutesPreviously tough to do (had to use escalations)Scheduled downtime:Notifications on downtimestart, end, cancellationCustom (TODO):User-initiated, customnotifications about hosts,servicesdefine host{host_namedevweb1first_notification_delay 15notification_options d,u,r,s...}NETWAYS Nagios Konferenz 200612Ethan Galstad


Plugin OutputMultiline output and perfdata:Extension of current plugin specMaintains compatibility with existing pluginsSupported for host/service and active/passive checksNo inherent limit on # of lines or characters in outputCurrent plugin spec:NETWAYS Nagios Konferenz 200613Ethan Galstad


<strong>New</strong> plugin spec:Plugin OutputNETWAYS Nagios Konferenz 200614Ethan Galstad


Custom Object VariablesCustom variables:Available in host, service, contact definitionsPrefixed with an underscore (e.g. _mycustomvar)Contain user-specified dataPasswordsSNMP community stringsLocation informationInstant messaging addressesAccessible in macros and environment varsValues can be modified via external commandsNETWAYS Nagios Konferenz 200615Ethan Galstad


Custom Object VariablesExample - Custom host variables:<strong>Host</strong> Definitiondefine host{host_name devweb1address 192.168.0.1_mac_address 00-06-5B-75-AD-EB_LOCATION Room 451, Lenard Hall_InventoryID 560781_owner Paul Lezaro...}Macros$_HOSTMAC_ADDRESS$ = “00-06-5B-75-AD-EB”$_HOSTLOCATION$ = “Room 451, Lenard Hall”$_HOSTINVENTORYID$ = “560781”$_HOSTOWNER$ = “Paul Lezaro”Environment VarsNAGIOS__HOSTMAC_ADDRESS = “00-06-5B-75-AD-EB”NAGIOS__HOSTLOCATION = “Room 451, Lenard Hall”NAGIOS__HOSTINVENTORYID = “560781”NAGIOS__HOSTOWNER = “Paul Lezaro”NETWAYS Nagios Konferenz 200616Ethan Galstad


Custom Object VariablesExample - Custom service variables:Service Definitiondefine service{host_namedescription_SNMP_community_Notes...}router1UptimesecretSome notes...Macros$_SERVICESNMP_COMMUNITY$ = “secret”$_SERVICENOTES$ = “Some notes...”Environment VarsNAGIOS__SERVICESNMP_COMMUNITY = “secret”NAGIOS__SERVICENOTES = “Some notes...”NETWAYS Nagios Konferenz 200617Ethan Galstad


Custom Object VariablesExample - Custom contact variables:Contact Definitiondefine contact{contact_name paul_AIM_username something_Skype_number 555555555_Yahoo_IDsomething...}Macros$_CONTACTAIM_USERNAME$ = “something”$_CONTACTSKYPE_NUMBER$ = “555555555”$_CONTACTYAHOO_ID$ = “something”Environment VarsNAGIOS__CONTACTAIM_USERNAME = “something”NAGIOS__CONTACTSKYPE_NUMBER = “555555555”NAGIOS__CONTACTYAHOO_ID = “something”NETWAYS Nagios Konferenz 200618Ethan Galstad


Major overhaul!<strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Host</strong> checks are no longer a major bottleneckMost host checks run in parallelScheduled host checks now help performance<strong>Host</strong> checks now have a retry intervalNETWAYS Nagios Konferenz 200619Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>All hosts UP to startService problem detectedNETWAYS Nagios Konferenz 200620Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Host</strong> is checked max_attempts times<strong>Host</strong> is determined to be not upIs it down or unreachable?NETWAYS Nagios Konferenz 200621Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Host</strong> check propagated to parentParent is not upNETWAYS Nagios Konferenz 200622Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Check</strong> propagated to grandparentGranparent host is UPNETWAYS Nagios Konferenz 200623Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Status of host and parent can now bedeterminedNETWAYS Nagios Konferenz 200624Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Child hosts are checked (serially) andfound to be unreachable as wellNETWAYS Nagios Konferenz 200625Ethan Galstad


Old <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Terrible performance!All checks performed seriallyEverything else is put on holdNo notifications, service checks, etc.Time cost:(hosts) x (attempts/host) x (time/attempt)Worst case cost:(8 hosts) x (3 attempts/host) x (5 seconds/attempt) = 120 seconds!Best case cost:(8 hosts) x (1 attempts each) x (5 seconds/attempt) = 40 secondsNETWAYS Nagios Konferenz 200626Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>All hosts UP to startService problem detectedNETWAYS Nagios Konferenz 200627Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Host</strong> is checked 1 time (real or cached)<strong>Host</strong> is determined to be not upIs it down or unreachable?NETWAYS Nagios Konferenz 200628Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Assuming max attempts > 1 ...Switch2 goes into a soft down stateParallel checks of parent and childhosts are initiatedNETWAYS Nagios Konferenz 200629Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Parent and children are not upNETWAYS Nagios Konferenz 200630Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Soft states set for parent/child hostsSwitch2 is soft unreachable afteranother re-checkParallel checks propagated toextended “relatives”NETWAYS Nagios Konferenz 200631Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Eventually ...Parallel checks propagate to allnecessary hostsMax attempts reached for all hosts<strong>Host</strong>s enter hard statesNETWAYS Nagios Konferenz 200632Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Determining current host status:Current host status is critical in monitoringHow old is too old?Should the host be rechecked or can we use latest state?Cached host checks:If last host check result is “fresh enough” (within cachedcheck horizon), use old/cached statusIf not, run an actual check of the hostNETWAYS Nagios Konferenz 200633Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Predictive dependency checks:<strong>Host</strong> is in a soft problem stateParallel checks of all hosts itdepends on will also be launchedHelps ensure accurate dependencytests for notificationsNETWAYS Nagios Konferenz 200634Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong>Much better performance:Most checks performed in parallelCached results mean less overheadNotifications, service checks, etc. are not delayedScales better – especially in network outagesBest performance when:<strong>Host</strong> checks are regularly scheduledMax attempts > 1Cached host checks are enabledNETWAYS Nagios Konferenz 200635Ethan Galstad


<strong>New</strong> <strong>Host</strong> <strong>Check</strong> <strong>Logic</strong><strong>Check</strong> logic options:use_old_host_check_logic=[0/1]0 = Use new host check logic (3.x)1 = Use old host check logic (2.x and earlier)cached_host_check_horizon=[#]Seconds before host status need to be recheckedenable_predictive_host_dependency_checks=[0/1]0 = No predictive checks (2.x and earlier)1 = Perform predictive checksNETWAYS Nagios Konferenz 200636Ethan Galstad


Future PlansNagios 4.x:Other:DB integration (MySQL/Postgres) – NDOUtils addonPHP-based GUI withMultiple instance supportInternationalizationEasier addon integrationCommunity website for news, events, etc.Documentation wiki – of, by, and for the communityNETWAYS Nagios Konferenz 200637Ethan Galstad


Questions?Ethan Galstadnagios@nagios.org

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!