mon
mon is a resource-monitoring system designed to measure host or service availability. It was developed by Jim Trocki and is supported by an active community with a Web site at http://www.kernel.org/software/mon/ and a mailing list (information is available at the same location).
mon handles monitoring as two separate tasks: testing conditions and alerting on failures. Both functions are handled by external programs (usually scripts written in Perl). Many such scripts are distributed with mon.
mon itself is an engine that schedules tests based on your configuration and then passes the results of the test to appropriate alerting programs. This separation of functionality from mon enables you to make seamless changes to your monitoring system. All you need to do is add a new test or alert program and then modify your configuration. No changes to mon itself are needed (short of a kill -HUP to reread the config file).
In this section, we'll discuss getting and installing mon, configuring it, using it to monitor your network, and writing tests for it.
Getting and Installing mon
You can download mon from ftp.kernel.org at /pub/software/admin/mon/ (the kernel.org maintainers ask that you use one of the kernel.org mirrors).Version 0.38.20 was current as of this writing. In addition to mon, you'll also need to grab several Perl modules: Time::Period, Time::HiRes, Convert::BER, and Mon::*. Some additional modules are required for the test and alert scripts (for instance, telnet.monitor requires Net::Telnet).The fping.monitor script relies on the fping package that is also available from the same place you got mon.
Download and Install First!
Please make sure that you download and install the required packages before starting in on mon. Installing Perl modules is a bit beyond the scope of this book, but building fping requires a bit of hackery that you'll need to know about. Line 222 of fping.c contains a redeclaration of sys_errlist, so you should comment out the entire line. Failure to do so will cause the compilation to fail.
mon itself is a set of Perl scripts and configuration files, so you don't actually need to build it. Instead, you should configure it for local use (see the next section for details) and then test it. After it is configured properly, you can move it to its final location and set up a startup script in /etc/rc.d/init.d.
Configuring mon
You'll need to set up a mon.cf file representing your network. Listing 5 contains a simple file representing a network with two monitored hosts. cherry is a Web server and a workstation. I usually check workstations every 15 minutes to make sure that I can Telnet into them; I check Web servers every 5 minutes to ensure that they're serving up pages.
Listing 5 A mon Configuration File
# # Example "mon.cf" configuration for "mon". # # # global options # the eventual values for these options are commented out and values # for a test installation are currently in place # #cfbasedir = /usr/local/lib/mon/etc cfbasedir = . #alertdir = /usr/local/lib/mon/alert.d alertdir = ./alert.d #mondir = /usr/local/lib/mon/mon.d mondir = ./mon.d maxprocs = 20 histlength = 100 randstart = 60s # # authentication types: # getpwnam standard Unix passwd, NOT for shadow passwords # shadow Unix shadow passwords (not implemented) # userfile "mon" user file # authtype = userfile # # NB: hostgroup and watch entries are terminated with a blank line (or # end of file). Don't forget the blank lines between them or you lose. # # # group definitions (hostnames or IP addresses) # hostgroup workstations crash cherry hostgroup wwwservers cherry watch wwwservers service http interval 5m monitor http.monitor allow_empty_group period wd {Sun-Sat} alert mail.alert -S "web server has fallen down " pate upalert mail.alert -S "web server is back up " pate alertevery 45m watch workstations service telnet interval 15m monitor telnet.monitor period wd {Sun-Sat} alert mail.alert pate alertevery 1h
After you've set your configuration file, you can start mon:
[root@cherry mon-0.38.20]# ./mon -f -c mon.cf -b ´pwd´
And, after 2 or 3 minutes for the tests to start up, you can check the operating status of the hostgroups with the moncmd command:
[root@cherry mon-0.38.20]# ./clients/moncmd -s localhost list opstatus group=workstations service=telnet opstatus=1 last_opstatus=7 exitval=0 timer=895 last_success=970580889 last_trap=0 last_check=970580887 ack=0 ackcomment='' alerts_sent=0 depstatus=0 depend='' monitor='telnet.monitor' last__summary='' last_detail='' interval=900 group=wwwservers service=http opstatus=1 last_opstatus=7 exitval=0 timer=289 last_success=970580883 last_trap=0 last_check=970580881 ack=0 ackcomment='' alerts_sent=0 depstatus=0 depend='' monitor='http.monitor' last__summary='' last_detail='HOST localhost: ok\0aHTTP/1.1 200 OK\0d\0aDate: Tue, 03 Oct 2000 13:48:01 GMT\0d\0aServer: Apache/1.3.12 (Unix)\0d\0aConnection: close\0d\0aContent-Type: text/html\0a\0a' interval=300 220 list opstatus completed [root@cherry mon-0.38.20]#
In addition to the moncmd interface and the alerts, there are three distinct Web front ends for mon. mon.cgi (by Andrew Ryan) seems to be the most widely accepted; it was designed to provide all the functionality of the command-line tools through a Web interface. mon.cgi can be obtained from http://www.nam-shub.com/files/. In addition to mon.cgi, there are also minotaure (by Gilles Lamiral) and monshow (by Jim Trocki). minotaure, in particular, has very nice documentation.
Writing Tests for mon
I've written a sample mon test to check for finger daemons that aren't running. Although this probably isn't useful for real life, it should serve as a model for writing your own tests. Listing 6 contains a listing of the program:
Listing 6 A mon Test
#!/usr/bin/perl -w use strict; use Net::Telnet; my (@failures,@l); my $debug =0; foreach my $host (@ARGV) { my $t = new Net::Telnet( Timeout => 10, Port => 79, Errmode => "return"; if ($t -> open("$host")) { $t->print(""); my $lines = $t->getlines; unless ($lines) { push @failures, [$host, $t->errmsg]; } } else { push @failures, [$host, $t->errmsg]; } } exit 0 if (0 == @failures); foreach my $failed_host (@failures) { push @l, $$failed_host[0]; } print "@l \n"; foreach my $error (@failures) { print "$$error[0]: $$error[1]\n"; } exit 1;
Let's walk through this script to understand what's going on in a test script.
The first thing you'll need to know is how mon expects to pass the monitor script a list of hosts to test. mon calls external tests like this:
foo.monitor host1 host2 ... hostN
In the example script, we're grabbing those host names with the loop:
foreach my $host (@ARGV) { #do stuff }
That "do stuff " thing is the important bit; we'll get back to it in a minute. Before we do, we need to look at one more thinghow mon expects to be told of failures by the test. mon is actually looking for three things: an exit code (0 if there are no errors, or 1 otherwise), a list of failed hosts, and a list of error messages associated with the failed hosts. Returning an exit code is not a big deal; the more interesting thing is the creation of the two lists that mon wants. This is done in the last two foreach loops in our sample.
Back to the "do stuff " sectionin this example, I wanted to send alerts for boxes that weren't responding to finger requests. To perform the test, I used the Net::Telnet Perl module to make a TCP connection to the finger server (at port 79). Then I sent an empty string and waited for a response. If I got something back, I treated it as a working server. If there was no connection, or if I got an error, I treated it as a failure and popped the host and error message onto an array for later handling. After I had worked through the whole list of hosts, I could move on to the error-handling part of the test (if there were any failures).
That's all there is to it; not much magic there. The hardest part is sitting down to figure out how to test the condition you're looking for.