Michael Medin_Advanced Windows monitoring - netways
Michael Medin_Advanced Windows monitoring - netways
Michael Medin_Advanced Windows monitoring - netways
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Going beyond the basics
These slides represent the work and opinions<br />
of the author and do not constitute official<br />
positions of any organization sponsoring the<br />
author’s work<br />
This material has not been peer reviewed and<br />
is presented here as-is with the permission of<br />
the author.<br />
The author assumes no liability for any<br />
content or opinion expressed in this<br />
presentation and or use of content herein.
Developer (not system manager)<br />
◦ Quite a big difference<br />
◦ Not working with Nagios<br />
Accidentally ended up in our NOC<br />
◦ Hated BB<br />
The birth of NSClient++<br />
◦ 2003:ish<br />
◦ NSClient sucked (Broke Exchange)<br />
◦ NRPE_NT was to hard to use<br />
The open source of NSClient++<br />
◦ 2004:ish<br />
◦ “just for fun”<br />
The rebirth of NSClient++<br />
◦ 2007<br />
◦ A lot of users emailed me<br />
◦ Got a lot of hits on the webpage<br />
◦ Intense development lead to 0.3.0!
Agents<br />
◦ An overview of the agents<br />
◦ An overview of the protocols<br />
About NSClient++<br />
◦ Quick Introduction<br />
Using NSClient++<br />
◦ Eventlog Checking<br />
◦ WMI (<strong>Windows</strong> Management Instrumentation)<br />
◦ Scripts<br />
Q/A
An overview of the agents
Agent Age Protocol Licence<br />
SNMP 1990-2008 SNMP Proprietary<br />
NSClient 200x NSClient GPL<br />
NRPE_NT 200x-2006 NRPE GPL<br />
NSClient++ 2004-2008 NRPE,NSClient,NSCA GPL<br />
NC_NET 2004-2008 NSClient,NSCA GPL<br />
“Agentless” WMI N/A<br />
OpMonAgent 2008 NSClient,NRPE GPL
The good:<br />
◦ Standard solution<br />
◦ Hardware extensions<br />
• HP, IBM, DELL, etc...<br />
The bad:<br />
◦ Complex to use<br />
◦ No encryption<br />
◦ Not extensible<br />
◦ Not “popular” on<br />
windows<br />
◦ Needs a lot of<br />
extensions to be<br />
useful
The good:<br />
◦ Very stable<br />
• works “if it works”<br />
◦ Built in checks<br />
• easy to use<br />
The bad:<br />
◦ Requires client install<br />
◦ Outdated<br />
• no longer maintained<br />
◦ Very few checks<br />
• only basic checks<br />
◦ Not extensible
The good:<br />
◦ Extensible<br />
◦ Standard protocol<br />
• same as on *nix<br />
The bad:<br />
◦ Requires client install<br />
◦ No built-in checks<br />
• hard for simple checks<br />
◦ Old
The good:<br />
◦ Lots of features<br />
◦ Built in checks<br />
• easy to use<br />
◦ Can check:<br />
• WMI, EventLog, Scripts<br />
◦ Supports<br />
• NSCA, NSClient<br />
• (no encryption)<br />
The bad:<br />
◦ Requires client install<br />
◦ Written in .net<br />
◦ Not (that) extensible<br />
◦ Requires custom<br />
plug-in (Nagios side)<br />
◦ No Encryption<br />
◦ No NRPE Support
The good:<br />
◦ Built in checks<br />
• easy to use<br />
◦ Can check:<br />
• WMI, EventLog, Scripts<br />
◦ Supports:<br />
• NSCA, NRPE, NSClient<br />
• encryption (NRPE/NSCA)<br />
◦ Very extensible<br />
• Scripts, Lua, modules, etc<br />
The bad:<br />
◦ Requires client install<br />
◦ Hard to use at times<br />
◦ Has had a few bugs<br />
over time
The good:<br />
◦ No client-side install<br />
• (Usually requires a<br />
“proxy”)<br />
The bad:<br />
◦ Proprietary<br />
◦ Not extensible<br />
◦ Limited functionality
A new client (haven't looked into it much)<br />
Seems to be a new version of NSClient (still<br />
written in Delphi)<br />
Script and NRPE support<br />
Not that much new features
I would use either:<br />
◦ NSClient++<br />
◦ NC_NET<br />
I would not use (unless I have a specific reason):<br />
◦ SNMP<br />
• Complex to use<br />
◦ NSClient<br />
• Old and outdated<br />
◦ NRPE_NT<br />
• Hard for some (simple) checks<br />
◦ OpMonAgent<br />
• I don’t see the benefit<br />
◦ “Agentless” WMI<br />
• Limited functionality
An overview of the protocols
Protocol Method Encryption Auth Payload Args<br />
.<br />
NSClient Active No Yes Unlimited 1 Yes 1 Yes 1<br />
NRPE Active Yes No 1024 2 Yes No<br />
NSCA Passive Yes Yes Unlimited Yes Yes<br />
Future 3 Active Yes Yes Unlimited Yes Yes<br />
Multi Commands<br />
1) Protocol supports it but not check_nt<br />
2) NRPE Payload can be extended with recompile of check_nrpe and configured in NSClient++<br />
3) A future protocol I am thinking of adding to NSClient++ (NRPE 3.0)
I would use:<br />
◦ NRPE<br />
• For Active checks<br />
◦ NSCA<br />
• For passive checks<br />
I would not use:<br />
◦ NSClient<br />
• Two words: No encryption!<br />
• If you use it, NEVER use a “secret” password.
Quick Introduction
Internals:<br />
◦ C++ using W32 API<br />
◦ Around 20.000 lines of code (30.000 with comments)<br />
◦ Actively developed (unfortunately only by me)<br />
◦ Modularized design (low on resources)<br />
Runs on:<br />
◦ NT4, w2k, XP, w2k3, Vista, w2k8 ...<br />
◦ X86, x64, IA64 (I lack a compiler for that platform, but it works)<br />
Current Version:<br />
◦ 0.3.4 (out this weekend, maybe not… *grumle*)<br />
◦ Don’t use 0.2.7!<br />
Most features require NRPE<br />
◦ (or custom “NSClient”-client)<br />
Documentation online (WIKI)<br />
◦ http://nsclient.org
Not supported by a commercial entity<br />
◦ Donations welcome<br />
◦ Sponsoring available (contact me for details)<br />
Used by a lot of people (I think)<br />
◦ Impossible to estimate any figures<br />
Website has:<br />
◦ Around 8.000 unique visitors per month<br />
◦ Around 10.000 downloads per month
Starting/Stopping:<br />
◦ nsclient++ /start (net start nsclientpp)<br />
◦ nsclient++ /stop (net stop nsclientpp)<br />
◦ nsclient++ /test<br />
Configuration:<br />
◦ notepad nsc.ini<br />
nsclient++ /test<br />
Is your friend!<br />
Testing:<br />
1.Local (nsclient++ /test)<br />
2.From CLI (check_nrpe ...)<br />
3.From Nagios (add command)<br />
Enabling debug log (always on with /test):<br />
◦ [log]<br />
◦ debug=1<br />
Log File:<br />
◦ nsclient.log (nsc.log)
Eventlog checking
The good:<br />
◦ Powerfull interface<br />
The bad:<br />
◦ Hard to use!<br />
◦ Requires configuration<br />
◦ no out-of-the-box solution!<br />
• (might come in next version)<br />
A lot of theory!<br />
◦ (please dont despair)
Two different filtering strategies<br />
◦ Exclusive filtering (-filter=out)<br />
• If you want all errors (except…)<br />
◦ Inclusive filtering (-filter=in)<br />
• If you only want specific errors<br />
◦ Remember (-filter=new)<br />
• Dont forget this!<br />
• There is an “old” outdated syntax as well
Simplest to start with<br />
By default:<br />
◦ Everything is an error<br />
Produces a lot of noise<br />
◦ False positives<br />
Good if you just want to be warned<br />
Sample (all entries for last 2 days):<br />
◦ CheckEventLog file=application filter=new filter=out<br />
MaxWarn=1 MaxCrit=1 filter-generated=>2d
For advanced use<br />
By default:<br />
◦ Nothing is an error<br />
Easy to make mistakes (and miss errors)<br />
Good if you are only looking for specifics<br />
◦ Raid controllers, active directory, etc...<br />
Sample (all entries for last 2 days):<br />
◦ CheckEventLog file=application filter=new filter=in<br />
MaxWarn=1 MaxCrit=1 filter+generated=
Filter rule<br />
◦ A rule to match against every single line in the<br />
eventlog<br />
Chain<br />
◦ A set of filter rules used when finding errors<br />
◦ Linear (when a rule matches chain is terminated)
Order is important<br />
Start with the rule which will discard the most<br />
items.<br />
◦ filter-generated=>2d
Mode<br />
◦ If the filter is additive, subtractive (or “maybe”)<br />
Type (keyword)<br />
◦ What to match<br />
• Message<br />
• Event category<br />
• Event date<br />
• Etc...<br />
Equal Sign<br />
Operator<br />
◦ =, !=, > < etc...<br />
Value<br />
◦ The value to match
filter+ generated =< 2h
Consider The following rules:<br />
◦ filter-generated=2d<br />
• WRONG! (No equal sign)<br />
◦ filter-generated==2d<br />
• Correct!<br />
equal sign<br />
operator<br />
Always remember the “extra” equal sign!
Type Description<br />
eventType Type of error. (Microsoft says this is severity)<br />
error, warning, info, auditSuccess or auditFailure<br />
eventSource The name of the source of the event.<br />
The program who logged the message<br />
generated Time ago the message was generated.<br />
When it happened<br />
written Time ago the message was written to the log (don’t use)<br />
message<br />
eventID<br />
severity<br />
Filter strings in the message<br />
NOT the entire message!<br />
Filter based on the event id of the log message<br />
error code<br />
Filter based on event severity (I think this is severity)<br />
success, informational, warning or error
Option<br />
file<br />
filter<br />
MaxWarn<br />
MaxCrit<br />
<br />
Description<br />
The “eventlog file” to open.<br />
Use multiple file-options to check multiple files.<br />
Set filter mode (out, in, old, new)<br />
Maximum hits before a warning state is issued.<br />
Maximum hits before a critical state is issued.<br />
A list of filter rules to be matched (in order)
Option<br />
truncate<br />
syntax<br />
unique<br />
descriptions<br />
Description<br />
Length of returned data.<br />
Since NRPE (and NSClient++) has a limited capacity this is<br />
important. Usually 1023 is a good value.<br />
How to format the return data<br />
Only “one of each” record will be returned.<br />
(“count” (MaxWarn/MaxCrit) is not affected)<br />
If you plan on using the %message% syntax option.<br />
(Will impact performance “severely”)
CheckEventLog<br />
◦ file=application<br />
◦ file=system<br />
◦ filter=new<br />
◦ filter=out<br />
◦ MaxWarn=1<br />
◦ MaxCrit=1<br />
◦ filter-generated=>2d<br />
◦ filter-severity==success<br />
◦ filter-severity==informational<br />
◦ truncate=1023<br />
◦ unique<br />
◦ descriptions<br />
◦ "syntax=%severity%: %source%: %message% (%count%)“
DEMO
Don’t be discouraged by your first attempt<br />
Remember filter=new (on older versions)<br />
Use the truncate option (1023 is reasonable)<br />
Start small filter your way “up” (whilst testing)<br />
Not so hard once you get down to it.
Start with “everything” and work your way down.<br />
Both System and Application logfile<br />
Reasonable start filter:<br />
◦ filter-generated=>2d<br />
◦ filter-severity==success<br />
◦ filter-severity==informational<br />
Need to customize it for your environment.<br />
A good idea is to use more then one check<br />
1.Check “all errors” (Exclusive)<br />
2.Check “my service” (Inclusive)<br />
Don’t overdo it (eventlog checking is slow)
Would it make sense to check ”new entries”<br />
◦ Just check entries added since last check<br />
◦ Would be faster<br />
◦ Would be “better”<br />
◦ But would you use it
WMI - <strong>Windows</strong> Management<br />
Instrumentation
The purpose of WMI is to define a non-proprietary set of<br />
environment-independent specifications which allow<br />
management information to be shared between management<br />
applications.<br />
WMI prescribes enterprise management standards and related<br />
technologies that work with existing management standards,<br />
such as Desktop Management Interface (DMI) and SNMP.<br />
WMI complements these other standards by providing a<br />
uniform model. This model represents the managed<br />
environment through which management data from any<br />
source can be accessed in a common way.<br />
…yada yada yada…<br />
In short: Like SNMP but “modern” ☺
Everything<br />
◦ Almost...<br />
There is a lot of objects (tables)<br />
◦ win32 has 450 objects<br />
◦ Various services will add more (AD, SQL Server, ...)<br />
You can:<br />
◦ Read, write and work with “objects”.<br />
◦ (only read via NSClient++)<br />
But you cant:<br />
◦ Check your application
Dangerous!<br />
◦ No security, allows access to a lot of things.<br />
Fairly “unexplored” in NSClient++<br />
Two commands:<br />
◦ CheckWMI<br />
• Check a result set<br />
• NSClient++ does filtering<br />
• Good for check if “more (or less) then n items...”<br />
◦ CheckWMIValue<br />
• Check a specific value<br />
• WMI Does filtering
WQL - WMI Query Language<br />
◦ Based upon SQL<br />
◦ Only select features (no update/insert/delete)<br />
“Tables” are called objects in WMI<br />
◦ An object usually correspond to a logical “types”.<br />
Example:<br />
◦ select * from win32_Processor<br />
• Retrieves everything from the win32_Processor ”object”.
Object<br />
Win32_Fan<br />
Win32_TemperatureProbe<br />
Win32_DiskDrive<br />
Win32_PhysicalMedia<br />
Win32_TapeDrive<br />
Win32_BaseBoard<br />
Win32_BIOS<br />
Win32_IDEController<br />
Win32_MemoryArray<br />
Win32_OnBoardDevice<br />
Win32_Processor<br />
Win32_SCSIController<br />
Win32_USBControllerDevic<br />
e<br />
Win32_NetworkAdapter<br />
Win32_Battery<br />
Win32_PortableBattery<br />
Win32_PowerManagementEve<br />
nt Win32_UninterruptiblePow<br />
erSupply<br />
Win32_Printer<br />
Win32_PrintJob<br />
Description<br />
Represents the properties of a fan temperature device in sensor the computer (electronic system.<br />
thermometer).<br />
Represents a physical disk drive as seen by a computer running the<br />
<strong>Windows</strong> operating system.<br />
Represents any type of documentation or storage medium.<br />
Represents a tape drive on a computer system running <strong>Windows</strong>.<br />
Represents a the baseboard attributes (also of known the computer as a motherboard system's basic or system input board). or output<br />
services Represents (BIOS). the capabilities of an Integrated Drive Electronics (IDE)<br />
controller Represents device. the properties of the computer system memory array and mapped<br />
addresses.<br />
Represents common adapter devices built into the motherboard (system<br />
board). Represents a device capable of interpreting a sequence of machine<br />
instructions Represents a on small the computer. system interface (SCSI) controller on a<br />
computer Relates a system USB controller running <strong>Windows</strong>. and the CIM_LogicalDevice instances connected<br />
to it.<br />
Represents a network adapter on a computer system running <strong>Windows</strong>.<br />
Represents a the battery properties connected of a to portable the computer battery, system. such as one used for a<br />
notebook computer.<br />
Represents the power capabilities management and events management resulting capacity from power of an state changes.<br />
uninterruptible power supply (UPS).<br />
Represents a device connected to a computer system running <strong>Windows</strong> that<br />
is capable of reproducing a visual image on a medium.<br />
Represents a print job generated by a <strong>Windows</strong>-based application.
Object<br />
Description<br />
Win32_SystemDriver Represents the system driver for a base service.<br />
Win32_Directory<br />
Represents a directory entry on a computer system running <strong>Windows</strong>.<br />
Win32_DiskQuota<br />
Tracks disk space usage for NTFS file system volumes.<br />
Win32_LogicalDisk Represents a data source that resolves to an actual local storage device.<br />
Win32_Volume<br />
Represents an the area file of used storage for handling on a hard virtual disk. memory file swapping on a<br />
Win32_PageFileUsage computer system running <strong>Windows</strong>.<br />
Win32_NetworkConnection Represents an active network connection in a <strong>Windows</strong> environment.<br />
Win32_NTDomain<br />
Represents a <strong>Windows</strong> NT domain.<br />
Win32_PingStatus<br />
Represents the values returned by the standard ping command.<br />
Win32_ComputerSystem<br />
Represents a an computer operating system operating installed in on a <strong>Windows</strong> a computer environment.<br />
system running<br />
Win32_OperatingSystem <strong>Windows</strong>.<br />
Win32_Process<br />
Represents a the sequence startup of configuration events a computer of a computer system system running running <strong>Windows</strong>.<br />
Win32_ProcessStartup <strong>Windows</strong>.<br />
Win32_ScheduledJob<br />
Represents a executable job scheduled objects using that the are <strong>Windows</strong> installed NT schedule in a registry service. database<br />
Win32_BaseService maintained by the SCM.<br />
Win32_Service<br />
Represents Describes the a service logon session a computer or sessions system associated running <strong>Windows</strong>. with a user logged on<br />
Win32_LogonSession to Represents <strong>Windows</strong> 2000 information <strong>Windows</strong> about NT. a user account on a computer system running<br />
Win32_UserAccount <strong>Windows</strong>.<br />
Win32_UserInDomain<br />
Win32_<strong>Windows</strong>ProductActi Association class<br />
vation<br />
Contains properties and methods related to WPA.<br />
Win32_NTEvent...<br />
Yes you can even check the eventlog!
NSClient++ has support for executing WQL<br />
queries ”as is” and get the result.<br />
◦ nsclient++ -noboot CheckWMI <br />
◦ This does not support namespaces (yet).<br />
◦ (Namespaces are supported via check commands)<br />
Sample use<br />
◦ nsclient++ -noboot CheckWMI select * from win32_Processor
Best way to start<br />
Simple to use...<br />
◦ ...if you know your WMI<br />
A sample query:<br />
◦ CheckWMIValue<br />
• "Query=Select * from win32_Processor“<br />
• MaxWarn=80<br />
• MaxCrit=90<br />
• Check:CPU=LoadPercentage<br />
• ShowAll=long<br />
◦ (a bit like CheckCPU)
Option<br />
MaxWarn<br />
MaxCrit<br />
MinWarn<br />
MinCrit<br />
ShowAll<br />
Query<br />
Check<br />
truncate<br />
AliasCol<br />
Description<br />
The maximum allowed value for the column(s).<br />
The maximum allowed value for the column(s).<br />
The minimum allowed value for the column(s).<br />
The If present minimum will allowed display value information the column(s). even if an item is<br />
not reporting a state.<br />
If The set WMI to query long to will ask display (not stackable, more information. only one query at a<br />
time) A column name to check (if * all columns will be checked)<br />
(this is stackable, so you can compare any number of<br />
columns)<br />
The A column maximum to length be included of the (prefixed) query-result. in the result when<br />
there are errors.
DEMO
Scripts
External Scripts<br />
◦ VB, Perl, Python, ...<br />
◦ .exe files<br />
◦ .net<br />
◦ ...<br />
Lua<br />
◦ Lua is a simple programming language<br />
◦ Used INSIDE NSClient++<br />
◦ Very powerful, and simple<br />
◦ A fairly new feature so feel free to suggest things<br />
Modules<br />
◦ Written in C++, Vb, .net, ...<br />
◦ Very powerful, but “hard”
Configuration:<br />
◦ [modules]<br />
◦ CheckExternalScripts.dll<br />
◦ ...<br />
◦ [External Scripts]<br />
◦ =<br />
• is the command from nrpe<br />
• is the command to execute<br />
• check_es_ok=scripts\ok.bat
Sample Code:<br />
◦ @echo CRITICAL: Everything is not going to be ok!<br />
◦ @exit 2<br />
Exit statuses:<br />
◦ 0OK<br />
◦ 1 Warning<br />
◦ 2 Critical<br />
◦ 3 Unknown
Sample Code:<br />
◦ Wscript.StdOut.WriteLine “Everything might not be ok"<br />
◦ Wscript.Quit(1)<br />
Exit statuses:<br />
◦ 0OK<br />
◦ 1 Warning<br />
◦ 2 Critical<br />
◦ 3 Unknown<br />
NSC.ini syntax:<br />
◦ [External Scripts]<br />
◦ check_vbs=cscript.exe /T:30 /NoLogo scripts\check_vb.vbs
This is exactly as writing ”regular” Nagios<br />
scripts.
Configuration:<br />
◦ [modules]<br />
◦ LUAScript.dll<br />
◦ ...<br />
◦ [LUA Scripts]<br />
◦ <br />
• scripts\test.lua<br />
What, no alias<br />
◦ Not needed…
nscp.print('Loading test script...')<br />
nscp.register('check_foo', ‘foo')<br />
<br />
<br />
function foo (command)<br />
◦ nscp.print(command)<br />
◦ code, msg, perf = nscp.execute('CheckCPU','time=5','MaxCrit=5')<br />
◦ return code, 'hello from LUA: ' .. msg, perf<br />
end
The power of Lua scripts comes from:<br />
◦ The ability to run and modify the result of other<br />
commands<br />
◦ The ability to run ”inside” NSClient++<br />
◦ The simplicity of the language
Questions