20.01.2015 Views

Michael Medin_Advanced Windows monitoring - netways

Michael Medin_Advanced Windows monitoring - netways

Michael Medin_Advanced Windows monitoring - netways

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Going beyond the basics


These slides represent the work and opinions<br />

of the author and do not constitute official<br />

positions of any organization sponsoring the<br />

author’s work<br />

This material has not been peer reviewed and<br />

is presented here as-is with the permission of<br />

the author.<br />

The author assumes no liability for any<br />

content or opinion expressed in this<br />

presentation and or use of content herein.


Developer (not system manager)<br />

◦ Quite a big difference<br />

◦ Not working with Nagios<br />

Accidentally ended up in our NOC<br />

◦ Hated BB<br />

The birth of NSClient++<br />

◦ 2003:ish<br />

◦ NSClient sucked (Broke Exchange)<br />

◦ NRPE_NT was to hard to use<br />

The open source of NSClient++<br />

◦ 2004:ish<br />

◦ “just for fun”<br />

The rebirth of NSClient++<br />

◦ 2007<br />

◦ A lot of users emailed me<br />

◦ Got a lot of hits on the webpage<br />

◦ Intense development lead to 0.3.0!


Agents<br />

◦ An overview of the agents<br />

◦ An overview of the protocols<br />

About NSClient++<br />

◦ Quick Introduction<br />

Using NSClient++<br />

◦ Eventlog Checking<br />

◦ WMI (<strong>Windows</strong> Management Instrumentation)<br />

◦ Scripts<br />

Q/A


An overview of the agents


Agent Age Protocol Licence<br />

SNMP 1990-2008 SNMP Proprietary<br />

NSClient 200x NSClient GPL<br />

NRPE_NT 200x-2006 NRPE GPL<br />

NSClient++ 2004-2008 NRPE,NSClient,NSCA GPL<br />

NC_NET 2004-2008 NSClient,NSCA GPL<br />

“Agentless” WMI N/A<br />

OpMonAgent 2008 NSClient,NRPE GPL


The good:<br />

◦ Standard solution<br />

◦ Hardware extensions<br />

• HP, IBM, DELL, etc...<br />

The bad:<br />

◦ Complex to use<br />

◦ No encryption<br />

◦ Not extensible<br />

◦ Not “popular” on<br />

windows<br />

◦ Needs a lot of<br />

extensions to be<br />

useful


The good:<br />

◦ Very stable<br />

• works “if it works”<br />

◦ Built in checks<br />

• easy to use<br />

The bad:<br />

◦ Requires client install<br />

◦ Outdated<br />

• no longer maintained<br />

◦ Very few checks<br />

• only basic checks<br />

◦ Not extensible


The good:<br />

◦ Extensible<br />

◦ Standard protocol<br />

• same as on *nix<br />

The bad:<br />

◦ Requires client install<br />

◦ No built-in checks<br />

• hard for simple checks<br />

◦ Old


The good:<br />

◦ Lots of features<br />

◦ Built in checks<br />

• easy to use<br />

◦ Can check:<br />

• WMI, EventLog, Scripts<br />

◦ Supports<br />

• NSCA, NSClient<br />

• (no encryption)<br />

The bad:<br />

◦ Requires client install<br />

◦ Written in .net<br />

◦ Not (that) extensible<br />

◦ Requires custom<br />

plug-in (Nagios side)<br />

◦ No Encryption<br />

◦ No NRPE Support


The good:<br />

◦ Built in checks<br />

• easy to use<br />

◦ Can check:<br />

• WMI, EventLog, Scripts<br />

◦ Supports:<br />

• NSCA, NRPE, NSClient<br />

• encryption (NRPE/NSCA)<br />

◦ Very extensible<br />

• Scripts, Lua, modules, etc<br />

The bad:<br />

◦ Requires client install<br />

◦ Hard to use at times<br />

◦ Has had a few bugs<br />

over time


The good:<br />

◦ No client-side install<br />

• (Usually requires a<br />

“proxy”)<br />

The bad:<br />

◦ Proprietary<br />

◦ Not extensible<br />

◦ Limited functionality


A new client (haven't looked into it much)<br />

Seems to be a new version of NSClient (still<br />

written in Delphi)<br />

Script and NRPE support<br />

Not that much new features


I would use either:<br />

◦ NSClient++<br />

◦ NC_NET<br />

I would not use (unless I have a specific reason):<br />

◦ SNMP<br />

• Complex to use<br />

◦ NSClient<br />

• Old and outdated<br />

◦ NRPE_NT<br />

• Hard for some (simple) checks<br />

◦ OpMonAgent<br />

• I don’t see the benefit<br />

◦ “Agentless” WMI<br />

• Limited functionality


An overview of the protocols


Protocol Method Encryption Auth Payload Args<br />

.<br />

NSClient Active No Yes Unlimited 1 Yes 1 Yes 1<br />

NRPE Active Yes No 1024 2 Yes No<br />

NSCA Passive Yes Yes Unlimited Yes Yes<br />

Future 3 Active Yes Yes Unlimited Yes Yes<br />

Multi Commands<br />

1) Protocol supports it but not check_nt<br />

2) NRPE Payload can be extended with recompile of check_nrpe and configured in NSClient++<br />

3) A future protocol I am thinking of adding to NSClient++ (NRPE 3.0)


I would use:<br />

◦ NRPE<br />

• For Active checks<br />

◦ NSCA<br />

• For passive checks<br />

I would not use:<br />

◦ NSClient<br />

• Two words: No encryption!<br />

• If you use it, NEVER use a “secret” password.


Quick Introduction


Internals:<br />

◦ C++ using W32 API<br />

◦ Around 20.000 lines of code (30.000 with comments)<br />

◦ Actively developed (unfortunately only by me)<br />

◦ Modularized design (low on resources)<br />

Runs on:<br />

◦ NT4, w2k, XP, w2k3, Vista, w2k8 ...<br />

◦ X86, x64, IA64 (I lack a compiler for that platform, but it works)<br />

Current Version:<br />

◦ 0.3.4 (out this weekend, maybe not… *grumle*)<br />

◦ Don’t use 0.2.7!<br />

Most features require NRPE<br />

◦ (or custom “NSClient”-client)<br />

Documentation online (WIKI)<br />

◦ http://nsclient.org


Not supported by a commercial entity<br />

◦ Donations welcome<br />

◦ Sponsoring available (contact me for details)<br />

Used by a lot of people (I think)<br />

◦ Impossible to estimate any figures<br />

Website has:<br />

◦ Around 8.000 unique visitors per month<br />

◦ Around 10.000 downloads per month


Starting/Stopping:<br />

◦ nsclient++ /start (net start nsclientpp)<br />

◦ nsclient++ /stop (net stop nsclientpp)<br />

◦ nsclient++ /test<br />

Configuration:<br />

◦ notepad nsc.ini<br />

nsclient++ /test<br />

Is your friend!<br />

Testing:<br />

1.Local (nsclient++ /test)<br />

2.From CLI (check_nrpe ...)<br />

3.From Nagios (add command)<br />

Enabling debug log (always on with /test):<br />

◦ [log]<br />

◦ debug=1<br />

Log File:<br />

◦ nsclient.log (nsc.log)


Eventlog checking


The good:<br />

◦ Powerfull interface<br />

The bad:<br />

◦ Hard to use!<br />

◦ Requires configuration<br />

◦ no out-of-the-box solution!<br />

• (might come in next version)<br />

A lot of theory!<br />

◦ (please dont despair)


Two different filtering strategies<br />

◦ Exclusive filtering (-filter=out)<br />

• If you want all errors (except…)<br />

◦ Inclusive filtering (-filter=in)<br />

• If you only want specific errors<br />

◦ Remember (-filter=new)<br />

• Dont forget this!<br />

• There is an “old” outdated syntax as well


Simplest to start with<br />

By default:<br />

◦ Everything is an error<br />

Produces a lot of noise<br />

◦ False positives<br />

Good if you just want to be warned<br />

Sample (all entries for last 2 days):<br />

◦ CheckEventLog file=application filter=new filter=out<br />

MaxWarn=1 MaxCrit=1 filter-generated=>2d


For advanced use<br />

By default:<br />

◦ Nothing is an error<br />

Easy to make mistakes (and miss errors)<br />

Good if you are only looking for specifics<br />

◦ Raid controllers, active directory, etc...<br />

Sample (all entries for last 2 days):<br />

◦ CheckEventLog file=application filter=new filter=in<br />

MaxWarn=1 MaxCrit=1 filter+generated=


Filter rule<br />

◦ A rule to match against every single line in the<br />

eventlog<br />

Chain<br />

◦ A set of filter rules used when finding errors<br />

◦ Linear (when a rule matches chain is terminated)


Order is important<br />

Start with the rule which will discard the most<br />

items.<br />

◦ filter-generated=>2d


Mode<br />

◦ If the filter is additive, subtractive (or “maybe”)<br />

Type (keyword)<br />

◦ What to match<br />

• Message<br />

• Event category<br />

• Event date<br />

• Etc...<br />

Equal Sign<br />

Operator<br />

◦ =, !=, > < etc...<br />

Value<br />

◦ The value to match


filter+ generated =< 2h


Consider The following rules:<br />

◦ filter-generated=2d<br />

• WRONG! (No equal sign)<br />

◦ filter-generated==2d<br />

• Correct!<br />

equal sign<br />

operator<br />

Always remember the “extra” equal sign!


Type Description<br />

eventType Type of error. (Microsoft says this is severity)<br />

error, warning, info, auditSuccess or auditFailure<br />

eventSource The name of the source of the event.<br />

The program who logged the message<br />

generated Time ago the message was generated.<br />

When it happened<br />

written Time ago the message was written to the log (don’t use)<br />

message<br />

eventID<br />

severity<br />

Filter strings in the message<br />

NOT the entire message!<br />

Filter based on the event id of the log message<br />

error code<br />

Filter based on event severity (I think this is severity)<br />

success, informational, warning or error


Option<br />

file<br />

filter<br />

MaxWarn<br />

MaxCrit<br />

<br />

Description<br />

The “eventlog file” to open.<br />

Use multiple file-options to check multiple files.<br />

Set filter mode (out, in, old, new)<br />

Maximum hits before a warning state is issued.<br />

Maximum hits before a critical state is issued.<br />

A list of filter rules to be matched (in order)


Option<br />

truncate<br />

syntax<br />

unique<br />

descriptions<br />

Description<br />

Length of returned data.<br />

Since NRPE (and NSClient++) has a limited capacity this is<br />

important. Usually 1023 is a good value.<br />

How to format the return data<br />

Only “one of each” record will be returned.<br />

(“count” (MaxWarn/MaxCrit) is not affected)<br />

If you plan on using the %message% syntax option.<br />

(Will impact performance “severely”)


CheckEventLog<br />

◦ file=application<br />

◦ file=system<br />

◦ filter=new<br />

◦ filter=out<br />

◦ MaxWarn=1<br />

◦ MaxCrit=1<br />

◦ filter-generated=>2d<br />

◦ filter-severity==success<br />

◦ filter-severity==informational<br />

◦ truncate=1023<br />

◦ unique<br />

◦ descriptions<br />

◦ "syntax=%severity%: %source%: %message% (%count%)“


DEMO


Don’t be discouraged by your first attempt<br />

Remember filter=new (on older versions)<br />

Use the truncate option (1023 is reasonable)<br />

Start small filter your way “up” (whilst testing)<br />

Not so hard once you get down to it.


Start with “everything” and work your way down.<br />

Both System and Application logfile<br />

Reasonable start filter:<br />

◦ filter-generated=>2d<br />

◦ filter-severity==success<br />

◦ filter-severity==informational<br />

Need to customize it for your environment.<br />

A good idea is to use more then one check<br />

1.Check “all errors” (Exclusive)<br />

2.Check “my service” (Inclusive)<br />

Don’t overdo it (eventlog checking is slow)


Would it make sense to check ”new entries”<br />

◦ Just check entries added since last check<br />

◦ Would be faster<br />

◦ Would be “better”<br />

◦ But would you use it


WMI - <strong>Windows</strong> Management<br />

Instrumentation


The purpose of WMI is to define a non-proprietary set of<br />

environment-independent specifications which allow<br />

management information to be shared between management<br />

applications.<br />

WMI prescribes enterprise management standards and related<br />

technologies that work with existing management standards,<br />

such as Desktop Management Interface (DMI) and SNMP.<br />

WMI complements these other standards by providing a<br />

uniform model. This model represents the managed<br />

environment through which management data from any<br />

source can be accessed in a common way.<br />

…yada yada yada…<br />

In short: Like SNMP but “modern” ☺


Everything<br />

◦ Almost...<br />

There is a lot of objects (tables)<br />

◦ win32 has 450 objects<br />

◦ Various services will add more (AD, SQL Server, ...)<br />

You can:<br />

◦ Read, write and work with “objects”.<br />

◦ (only read via NSClient++)<br />

But you cant:<br />

◦ Check your application


Dangerous!<br />

◦ No security, allows access to a lot of things.<br />

Fairly “unexplored” in NSClient++<br />

Two commands:<br />

◦ CheckWMI<br />

• Check a result set<br />

• NSClient++ does filtering<br />

• Good for check if “more (or less) then n items...”<br />

◦ CheckWMIValue<br />

• Check a specific value<br />

• WMI Does filtering


WQL - WMI Query Language<br />

◦ Based upon SQL<br />

◦ Only select features (no update/insert/delete)<br />

“Tables” are called objects in WMI<br />

◦ An object usually correspond to a logical “types”.<br />

Example:<br />

◦ select * from win32_Processor<br />

• Retrieves everything from the win32_Processor ”object”.


Object<br />

Win32_Fan<br />

Win32_TemperatureProbe<br />

Win32_DiskDrive<br />

Win32_PhysicalMedia<br />

Win32_TapeDrive<br />

Win32_BaseBoard<br />

Win32_BIOS<br />

Win32_IDEController<br />

Win32_MemoryArray<br />

Win32_OnBoardDevice<br />

Win32_Processor<br />

Win32_SCSIController<br />

Win32_USBControllerDevic<br />

e<br />

Win32_NetworkAdapter<br />

Win32_Battery<br />

Win32_PortableBattery<br />

Win32_PowerManagementEve<br />

nt Win32_UninterruptiblePow<br />

erSupply<br />

Win32_Printer<br />

Win32_PrintJob<br />

Description<br />

Represents the properties of a fan temperature device in sensor the computer (electronic system.<br />

thermometer).<br />

Represents a physical disk drive as seen by a computer running the<br />

<strong>Windows</strong> operating system.<br />

Represents any type of documentation or storage medium.<br />

Represents a tape drive on a computer system running <strong>Windows</strong>.<br />

Represents a the baseboard attributes (also of known the computer as a motherboard system's basic or system input board). or output<br />

services Represents (BIOS). the capabilities of an Integrated Drive Electronics (IDE)<br />

controller Represents device. the properties of the computer system memory array and mapped<br />

addresses.<br />

Represents common adapter devices built into the motherboard (system<br />

board). Represents a device capable of interpreting a sequence of machine<br />

instructions Represents a on small the computer. system interface (SCSI) controller on a<br />

computer Relates a system USB controller running <strong>Windows</strong>. and the CIM_LogicalDevice instances connected<br />

to it.<br />

Represents a network adapter on a computer system running <strong>Windows</strong>.<br />

Represents a the battery properties connected of a to portable the computer battery, system. such as one used for a<br />

notebook computer.<br />

Represents the power capabilities management and events management resulting capacity from power of an state changes.<br />

uninterruptible power supply (UPS).<br />

Represents a device connected to a computer system running <strong>Windows</strong> that<br />

is capable of reproducing a visual image on a medium.<br />

Represents a print job generated by a <strong>Windows</strong>-based application.


Object<br />

Description<br />

Win32_SystemDriver Represents the system driver for a base service.<br />

Win32_Directory<br />

Represents a directory entry on a computer system running <strong>Windows</strong>.<br />

Win32_DiskQuota<br />

Tracks disk space usage for NTFS file system volumes.<br />

Win32_LogicalDisk Represents a data source that resolves to an actual local storage device.<br />

Win32_Volume<br />

Represents an the area file of used storage for handling on a hard virtual disk. memory file swapping on a<br />

Win32_PageFileUsage computer system running <strong>Windows</strong>.<br />

Win32_NetworkConnection Represents an active network connection in a <strong>Windows</strong> environment.<br />

Win32_NTDomain<br />

Represents a <strong>Windows</strong> NT domain.<br />

Win32_PingStatus<br />

Represents the values returned by the standard ping command.<br />

Win32_ComputerSystem<br />

Represents a an computer operating system operating installed in on a <strong>Windows</strong> a computer environment.<br />

system running<br />

Win32_OperatingSystem <strong>Windows</strong>.<br />

Win32_Process<br />

Represents a the sequence startup of configuration events a computer of a computer system system running running <strong>Windows</strong>.<br />

Win32_ProcessStartup <strong>Windows</strong>.<br />

Win32_ScheduledJob<br />

Represents a executable job scheduled objects using that the are <strong>Windows</strong> installed NT schedule in a registry service. database<br />

Win32_BaseService maintained by the SCM.<br />

Win32_Service<br />

Represents Describes the a service logon session a computer or sessions system associated running <strong>Windows</strong>. with a user logged on<br />

Win32_LogonSession to Represents <strong>Windows</strong> 2000 information <strong>Windows</strong> about NT. a user account on a computer system running<br />

Win32_UserAccount <strong>Windows</strong>.<br />

Win32_UserInDomain<br />

Win32_<strong>Windows</strong>ProductActi Association class<br />

vation<br />

Contains properties and methods related to WPA.<br />

Win32_NTEvent...<br />

Yes you can even check the eventlog!


NSClient++ has support for executing WQL<br />

queries ”as is” and get the result.<br />

◦ nsclient++ -noboot CheckWMI <br />

◦ This does not support namespaces (yet).<br />

◦ (Namespaces are supported via check commands)<br />

Sample use<br />

◦ nsclient++ -noboot CheckWMI select * from win32_Processor


Best way to start<br />

Simple to use...<br />

◦ ...if you know your WMI<br />

A sample query:<br />

◦ CheckWMIValue<br />

• "Query=Select * from win32_Processor“<br />

• MaxWarn=80<br />

• MaxCrit=90<br />

• Check:CPU=LoadPercentage<br />

• ShowAll=long<br />

◦ (a bit like CheckCPU)


Option<br />

MaxWarn<br />

MaxCrit<br />

MinWarn<br />

MinCrit<br />

ShowAll<br />

Query<br />

Check<br />

truncate<br />

AliasCol<br />

Description<br />

The maximum allowed value for the column(s).<br />

The maximum allowed value for the column(s).<br />

The minimum allowed value for the column(s).<br />

The If present minimum will allowed display value information the column(s). even if an item is<br />

not reporting a state.<br />

If The set WMI to query long to will ask display (not stackable, more information. only one query at a<br />

time) A column name to check (if * all columns will be checked)<br />

(this is stackable, so you can compare any number of<br />

columns)<br />

The A column maximum to length be included of the (prefixed) query-result. in the result when<br />

there are errors.


DEMO


Scripts


External Scripts<br />

◦ VB, Perl, Python, ...<br />

◦ .exe files<br />

◦ .net<br />

◦ ...<br />

Lua<br />

◦ Lua is a simple programming language<br />

◦ Used INSIDE NSClient++<br />

◦ Very powerful, and simple<br />

◦ A fairly new feature so feel free to suggest things<br />

Modules<br />

◦ Written in C++, Vb, .net, ...<br />

◦ Very powerful, but “hard”


Configuration:<br />

◦ [modules]<br />

◦ CheckExternalScripts.dll<br />

◦ ...<br />

◦ [External Scripts]<br />

◦ =<br />

• is the command from nrpe<br />

• is the command to execute<br />

• check_es_ok=scripts\ok.bat


Sample Code:<br />

◦ @echo CRITICAL: Everything is not going to be ok!<br />

◦ @exit 2<br />

Exit statuses:<br />

◦ 0OK<br />

◦ 1 Warning<br />

◦ 2 Critical<br />

◦ 3 Unknown


Sample Code:<br />

◦ Wscript.StdOut.WriteLine “Everything might not be ok"<br />

◦ Wscript.Quit(1)<br />

Exit statuses:<br />

◦ 0OK<br />

◦ 1 Warning<br />

◦ 2 Critical<br />

◦ 3 Unknown<br />

NSC.ini syntax:<br />

◦ [External Scripts]<br />

◦ check_vbs=cscript.exe /T:30 /NoLogo scripts\check_vb.vbs


This is exactly as writing ”regular” Nagios<br />

scripts.


Configuration:<br />

◦ [modules]<br />

◦ LUAScript.dll<br />

◦ ...<br />

◦ [LUA Scripts]<br />

◦ <br />

• scripts\test.lua<br />

What, no alias<br />

◦ Not needed…


nscp.print('Loading test script...')<br />

nscp.register('check_foo', ‘foo')<br />

<br />

<br />

function foo (command)<br />

◦ nscp.print(command)<br />

◦ code, msg, perf = nscp.execute('CheckCPU','time=5','MaxCrit=5')<br />

◦ return code, 'hello from LUA: ' .. msg, perf<br />

end


The power of Lua scripts comes from:<br />

◦ The ability to run and modify the result of other<br />

commands<br />

◦ The ability to run ”inside” NSClient++<br />

◦ The simplicity of the language


Questions

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!