Sunday, October 21, 2007

Nagios Installation and configuration

What is nagios ?

Nagios is a host and service monitor designed to inform you of network problems before your clients, end-users or managers do. It has been designed to run under the Linux operating system, but works fine under most *NIX variants as well. The monitoring daemon runs intermittent checks on hosts and services you specify using external "plugins" which return status information to Nagios. When problems are encountered, the daemon can send notifications out to administrative contacts in a variety of different ways (email, instant message, SMS, etc.). Current status information, historical logs, and reports can all be accessed via a web browser.

Nagios Features

Nagios has a lot of features, making it a very powerful monitoring tool. Some of the major features are listed below:

Monitoring of network services (SMTP, POP3, HTTP, NNTP, PING, etc.)

Monitoring of host resources (processor load, disk and memory usage, running processes, log files, etc.)

Monitoring of environmental factors such as temperature

Simple plugin design that allows users to easily develop their own host and service checks

Ability to define network host hierarchy, allowing detection of and distinction between hosts that are down and those that are unreachable

Contact notifications when service or host problems occur and get resolved (via email, pager, or other user-defined method)

Optional escalation of host and service notifications to different contact groups

Ability to define event handlers to be run during service or host events for proactive problem resolution

Support for implementing redundant and distributed monitoring servers

External command interface that allows on-the-fly modifications to be made to the monitoring and notification
behavior through the use of event handlers, the web interface, and third-party applications

Retention of host and service status across program restarts

Scheduled downtime for suppressing host and service notifications during periods of planned outages

Ability to acknowledge problems via the web interface

Web interface for viewing current network status, notification and problem history, log file, etc.

Simple authorization scheme that allows you restrict what users can see and do from the web interface

Nagios Requirements

The only requirement of running Nagios is a machine running Linux (or UNIX variant) and a C compiler. You will probably also want to have TCP/IP configured, as most service checks will be performed over the network.

You are not required to use the CGIs included with Nagios. However, if you do decide to use them, you will need to have the following software installed.

A web server (preferrably Apache)

Thomas Boutell's gd library version 1.6.3 or higher (required by the statusmap and trends CGIs)

Download Nagios

http://www.nagios.org/download/

Nagios Documentation

http://www.nagios.org/docs/

Nagios Screenshots

http://www.nagios.org/about/screenshots.php

Nagios FAQ

http://www.nagios.org/faqs/

Installing Nagios in Debian From Source Code

Prerequisites

The below packages are very important to install on your debian system to complete the nagios installation without any problem

#apt-get install make gcc g++

Create Nagios User/Group

You're probably going to want to run Nagios under a normal user account, so add a new user (and group) to your system with the following command (this will vary depending on what OS you're running):

#adduser nagios

This should create the user account and a default group with the same name (nagios). This can be checked by

# grep nagios /etc/passwd

This should show the group (if created) with the members.

If the group is missing then create the group by,

# groupadd nagios

This group can be used as the group that Nagios uses as a Command group.

Identify Web Server User

You're probably going to want to issue external commands (like acknowledgements and scheduled downtime) from the web interface. To do so, you need to identify the user your web server runs as (typically apache, although this may differ on your system). This setting is found in your web server configuration file. The following command can be used to quickly determine what user Apache is running as (paths may differ on your system):

# grep "^User" /etc/apache2/apache2.conf

Add Webserver user (www-data/apache) and Nagios user (nagios)

# usermod -G nagios nagios

#usermod -G www-data,nagios www-data

Check if the users are the members of the group by

# grep nagios /etc/group

Create Installation Directory

Create the base directory where you would like to install Nagios as follows

#mkdir /usr/local/nagios

Change the owner of the base installtion directory to be the Nagios user and group you added earlier as follows:

#chown -R nagios:nagios /usr/local/nagios

Prior to that, it is imporant to install the GD-Utils for the Status Maps to work properly.

In Debian, the following should install the required libraries:

# apt-get install libgd2-xpm libgd2-xpm-dev libgd2 libgd2-dev libpng12-dev libjpeg62-dev libgd-tools libpng3-dev

Now, download the GD-Utils from the following website:

http://www.boutell.com/gd/http/gd-2.0.33.tar.gz

Untar the downloaded Tar file by

# tar -zxvf gd-2.0.33.tar.gz

Change to the directory and run the config script.

# cd gd-2.033

# ./configure

Now, install using

# make && make install

This should install the GD-Utils.

Download the latest version of nagios from the following link

http://www.nagios.org/download/

Unpacking The Distribution

To unpack the Nagios distribution, use the following command

#tar xzf nagios-2.6.tar.gz

or

#unp nagios-2.6.tar.gz

If you want to install unp package check here

This will create the Nagios-version folder (nagios-2.6)

Change directory to the newly created directory

#cd nagios-version (nagios-2.6)

Run the configure script

# ./configure --prefix=/usr/local/nagios --with-cgiurl=/nagios/cgi-bin --with-htmurl=/nagios/ --with-nagios-user=nagios --with-nagios-group=nagios --with-command-group=nagios

where
--prefix=/usr/local/nagios is the Nagios root folder
--with-cgiurl=/nagios/cgi-bin is the Nagios CGI folder
--with-htmurl=/nagios/ is the Nagios HTML/Website folder
--with-nagios-user=nagios is the Nagios user
--with-nagios-group=nagios is the Nagios group
--with-command-group=nagios is the Nagios command group which has webserver user (Apache) and the nagios user as members.

If you want more oprtion just type the following command

#./configure --help

./configure --help
`configure' configures this package to adapt to many kinds of systems.

Usage: ./configure [OPTION]... [VAR=VALUE]...

To assign environment variables (e.g., CC, CFLAGS...), specify them as
VAR=VALUE. See below for descriptions of some of the useful variables.

Defaults for the options are specified in brackets.

Configuration:
-h, --help display this help and exit
--help=short display options specific to this package
--help=recursive display the short help of all the included packages
-V, --version display version information and exit
-q, --quiet, --silent do not print `checking...' messages
--cache-file=FILE cache test results in FILE [disabled]
-C, --config-cache alias for `--cache-file=config.cache'
-n, --no-create do not create output files
--srcdir=DIR find the sources in DIR [configure dir or `..']

Installation directories:
--prefix=PREFIX install architecture-independent files in PREFIX[/usr/local/nagios]
--exec-prefix=EPREFIX install architecture-dependent files in EPREFIX[PREFIX]

By default, `make install' will install all the files in `/usr/local/nagios/bin', `/usr/local/nagios/lib' etc. You can specify an installation prefix other than `/usr/local/nagios' using `--prefix', for instance `--prefix=$HOME'.

For better control, use the options below.

Fine tuning of the installation directories:
--bindir=DIR user executables [EPREFIX/bin]
--sbindir=DIR system admin executables [EPREFIX/sbin]
--libexecdir=DIR program executables [EPREFIX/libexec]
--datadir=DIR read-only architecture-independent data [PREFIX/share]
--sysconfdir=DIR read-only single-machine data [PREFIX/etc]
--sharedstatedir=DIR modifiable architecture-independent data [PREFIX/com]
--localstatedir=DIR modifiable single-machine data [PREFIX/var]
--libdir=DIR object code libraries [EPREFIX/lib]
--includedir=DIR C header files [PREFIX/include]
--oldincludedir=DIR C header files for non-gcc [/usr/include]
--infodir=DIR info documentation [PREFIX/info]
--mandir=DIR man documentation [PREFIX/man]

System types:
--build=BUILD configure for building on BUILD [guessed]
--host=HOST cross-compile to build programs to run on HOST [BUILD]

Optional Features:
--disable-FEATURE do not include FEATURE (same as --enable-FEATURE=no)
--enable-FEATURE[=ARG] include FEATURE [ARG=yes]
--disable-statusmap=disables compilation of statusmap CGI
--disable-statuswrl=disables compilation of statuswrl (VRML) CGI
--enable-DEBUG0 shows function entry and exit
--enable-DEBUG1 shows general info messages
--enable-DEBUG2 shows warning messages
--enable-DEBUG3 shows scheduled events (service and host checks... etc)
--enable-DEBUG4 shows service and host notifications
--enable-DEBUG5 shows SQL queries
--enable-DEBUGALL shows all debugging messages
--enable-nanosleep enables use of nanosleep (instead sleep) in event timing
--enable-event-broker enables integration of event broker routines
--enable-embedded-perl will enable embedded Perl interpreter
--enable-cygwin enables building under the CYGWIN environment

Optional Packages:
--with-PACKAGE[=ARG] use PACKAGE [ARG=yes]
--without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no)
--with-nagios-user= sets user name to run nagios
--with-nagios-group= sets group name to run nagios
--with-command-user= sets user name for command access
--with-command-group= sets group name for command access
--with-mail= sets path to equivalent program to mail
--with-init-dir= sets directory to place init script into
--with-lockfile= sets path and file name for lock file
--with-gd-lib=DIR sets location of the gd library
--with-gd-inc=DIR sets location of the gd include files
--with-cgiurl= sets URL for cgi programs (do not use a trailing slash)
--with-htmurl= sets URL for public html
--with-perlcache turns on cacheing of internally compiled Perl scripts

Some influential environment variables:
CC C compiler command
CFLAGS C compiler flags
LDFLAGS linker flags, e.g. -L if you have libraries in a
nonstandard directory
CPPFLAGS C/C++ preprocessor flags, e.g. -I if you have
headers in a nonstandard directory
CPP C preprocessor

Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations.

#make all

this will Compile Nagios and the CGIs

#make install

This will Install the binaries and HTML files (documentation and main web page)

#make install-init

This will install the Startup scripts

# make install-commandmode

this will Create the required directory for command file and assign appropriate permissions to it for the external commands.

# make install-config

This installs *SAMPLE* config files in /usr/local/nagios/etc .You'll have to modify these sample files before you can use Nagios.

This will complete the installation Now we need to know the Directory Structure and File locations

#cd /usr/local/nagios

You should see five different subdirectories. A brief description of what each directory contains is given below.

Sub-Directory Contents

bin/ Nagios core program
etc/ Main, resource, object, and CGI configuration files should be put here
sbin/ CGIs
share/ HTML files (for web interface and online documentation)
var/ Empty directory for the log file, status file, retention file, etc.
var/archives Empty directory for the archived logs
var/rw Empty directory for the external command file

Now one thing you need to concentrate is /usr/local/nagios/etc directory where all the sample configuration files stores.

cgi.cfg-sample nagios.cfg-sample bigger.cfg-sample misccommands.cfg-sample checkcommands.cfg-sample minimal.cfg-sample resource.cfg-sample

The above are the sample configuration files you need to rename those files to .cfg files i am showing here one example you need to do for the other files

# mv bigger.cfg-sample bigger.cfg

Important Note :- In nagios 2.x(For nagios 1.x versions it creats at the time of installation) hosts.cfg,services.cfg,commands.cfg and other configuration files are not crteated by default we need to create these files using the existing sample files like bigger.cfg and minimal.cfg and other files

We will see more about these file in Configuration section

Now, you have a completely installed nagios to work on. The next steps would be to install the plugins and start configuring Nagios.

Install Nagios Plugins

In order for Nagios to be of any use to you, you need to install nagios plugins. Plugins are usually installed in the libexec/ directory of your Nagios installation (i.e. /usr/local/nagios/libexec). Plugins are scripts or binaries which perform all the service and host checks that constitute monitoring.

Download the latest version of nagios Plugins from the following

http://sourceforge.net/projects/nagiosplug/

The nagios-Plugins version at the time of writing this document is nagios-plugins-1.4.5

Download Nagios Plugins

# wget http://mesh.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.5.tar.gz

Untar the downloaded file

# tar -zxvf nagios-plugins-.tar.gz [here nagios-plugins-1.4.5.tar.gz]

This will create a new directory nagios-plugins- [here nagios-plugins-1.4.5]

Change to the Directory and run the Configure Script

# cd nagios-plugins-1.4.5

# ./configure

Run the following to make and install the plugins

# make && make install

This should install the pugins in the /usr/local/nagios/libexec directory.

There are a few rules that all Nagios plugins should implement, making them suitable for use by Nagios. All plugins provide a --help option that displays information about the plugin and how it works. This feature helps a lot when you're trying to monitor a new service using a plugin you haven't used before.

For instance, to learn how the check_ssh plugin works, run the following command.

/usr/local/nagios/libexec# ./check_ssh -h
check_ssh (nagios-plugins 1.3.0-alpha1) 1.1.1.1
The nagios plugins come with ABSOLUTELY NO WARRANTY. You may redistribute
copies of the plugins under the terms of the GNU General Public License.
For more information about these matters, see the file named COPYING.
Copyright (c) 1999 Remi Paulmier (remi@sinfomic.fr)

Usage:

check_ssh -t [timeout] -p [port] check_ssh -V prints version info
check_ssh -h prints more detailed help by default, port is 22

This shows us that the check_ssh plugin accepts one required parameter host, and two optional paramters, timeout and port.

Now the complete Nagios Directory structure

bin/ Nagios core program
etc/ Main, resource, object, and CGI configuration files should be put here
sbin/ CGIs
share/ HTML files (for web interface and online documentation)
var/ Empty directory for the log file, status file, retention file, etc.
var/archives Empty directory for the archived logs
var/rw Empty directory for the external command file
libexec/ nagios plugins available in this

Now Basic installation of nagios completed.Now you need to configure the web interface for nagios.

Configure the Web interface For Nagios in Debian

Find how to setup the web-interface and configure the user authentication for nagios. This article also describes how to force teh CGIs to use Authentication.

Once Nagios and the plugins are installed. It's time to create the front end web interface for nagios. For this, you need to configure the Alias for the web interface and the script alias for the CGIs on your webserver.

In Debian with Apache2 you can do this by the following:

Create the config file for nagios in Apache

Creating the config file nagios (or in any name that you want the alias to called as) in the /etc/apache2/sites-available/ with the following contents (copy & paste using VI):

ScriptAlias /nagios/cgi-bin /usr/local/nagios/sbin


Options ExecCGI
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user


Alias /nagios /usr/local/nagios/share


Options None
AllowOverride None
Order allow,deny
Allow from all
AuthName "Nagios Access"
AuthType Basic
AuthUserFile /usr/local/nagios/etc/htpasswd.users
Require valid-user


Enable the website

# a2ensite nagios

This enables the site and you can find a soft link to the /etc/apache2/sites-available/nagios under /etc/apache2/sites-enabled/ directory.

Restart apache

# /etc/init.d/apache2 restart

Setup User Authentication

You can setup the authentication for the users to have access to the interface and hence not open to all.

This can be done by

# htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin

Enter the required password when prompted. This will setup htaccess on the site by creating a new file named htpasswd.users under /usr/local/nagios/etc/ with the first user nagios admin. This forces Apache to use the htpasswd.users to check and authenticate the users

You can add as many user you wish by

# htpasswd /usr/local/nagios/etc/htpasswd.users (username)

NOTE: Remember to remove -c from the previous command and substitute the username accordingly.

Force CGI s to use Authentication

For the CGIs to use Authentication on Nagios, edit the file /usr/local/nagios/etc/cgi.cfg and set

use_authentication=1

You can now access the web interface at http:///nagios/

This completes the setup of the web interface and the user and CGI authentication procedures.

1)When I click the "3-D Status Map" link my browser tries to download and save the statuswrl.cgi file ?

A)This will happen if you do not have a VRML client/plugin installed for your web browser. Installing a VRML plugin should resolve this issue.

Download VRML Plugin for your Browser

http://cic.nist.gov/vrml/vbdetect.html

Now you need to configure the Nagios Configuration files this is very important

--------------

Nagios Configuration Files



We have already dicussed sample configuration files will appear in /usr/local/nagios/etc folder.The following files are basic configuration files if you don't see any one of these file you need to create each file with the exact syntax.

We will explain each file with the complete syntax in the following sections

Nagios has a list of important files on which they depend upon. These range from the config files to the plugins, logs, command files etc.

The following are the files of importance in Nagios:

Note: The file path is assumed based on the default locations of the files.

Main Configuration File

/usr/local/nagios/etc/nagios.cfg

This is the configuration file which defines the various directives that Nagios uses. These directives include the path to various folders where Nagios needs to check in for the required files, the object config files, the command files etc and various other parameters which decide how Nagios operates.

Resource File

/usr/local/nagios/etc/resource.cfg

This file has the suer defined macros and other sensitive configuration information which are denied access for the CGIs.

Commands Config File

/usr/local/nagios/etc/commands.cfg

CGI Config file

/usr/local/nagios/etc/cgi.cfg

Other Object Configuration files include but not limited to the following:

/usr/local/nagios/etc/hosts.cfg

/usr/local/nagios/etc/hostgroup.cfg

/usr/local/nagios/etc/services.cfg

/usr/local/nagios/etc/servicegroup.cfg

/usr/local/nagios/etc/contacts.cfg

/usr/local/nagios/etc/contactsgroup.cfg

/usr/local/nagios/etc/timeperiod.cfg

Nagios Command File

/usr/local/nagios/var/rw/nagios.cmd

Nagios check this file for external commands to process. The command CGI writes commands to this file. Other third party programs can write to this file if proper file permissions have been granted as outline in here. The external command file is implemented as a named pipe (FIFO), which is created when Nagios starts and removed when it shuts down. If the file exists when Nagios starts, the Nagios process will terminate with an error message.

Nagios Log Files

Status Log

/usr/local/nagios/var/status.log

Downtime Log File

/usr/local/nagios/var/downtime.log

Comment log File

/usr/local/nagios/var/comment.log

Nagios Lock File

/tmp/nagios.lock

Nagios creates this file when it runs as a daemon. This file contains the process id (PID) number of the running Nagios process.

Nagios Temp File

/usr/local/nagios/var/nagios.tmp

State Retention File

/usr/local/nagios/var/status.sav

This is the file that Nagios will use for storing service and host state information before it shuts down. When Nagios is restarted it will use the information stored in this file for setting the initial states of services and hosts before it starts monitoring anything. This file is deleted after Nagios reads in initial state information when it (re)starts.
Configure nagios Files

These are the Object configuration files for nagios these files are pointed in nagios.cfg file which is the main configuration file.If you don't have the following files just create these files using the follwing command

#touch

and Check the file permissions and ownership

/usr/local/nagios/etc/contactgroups.cfg
/usr/local/nagios/etc/contacts.cfg
/usr/local/nagios/etc/services.cfg
/usr/local/nagios/etc/dependencies.cfg
/usr/local/nagios/etc/escalations.cfg
/usr/local/nagios/etc/hostgroups.cfg
/usr/local/nagios/etc/hosts.cfg
/usr/local/nagios/etc/servicegroups.cfg
/usr/local/nagios/etc/timeperiods.cfg

You will first need to set the authentication option for the nagiosadmin user in $NAGIOSHOME/etc/cgi.cfg:-

use_authentication=1
authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

Of course, other users can be set up with different privileges. Remember to create them in $NAGIOSHOME/etc/htpasswd.users.

Also, you need to make sure that the relevant users have the correct permissions for nagios. Usually, you will want the admin user to be able to do everything. So, edit these lines in $NAGIOSHOME/etc/cgi.cfg as follows:-

authorized_for_system_information=nagiosadmin
authorized_for_configuration_information=nagiosadmin
authorized_for_system_commands=nagiosadmin
authorized_for_all_services=nagiosadmin
authorized_for_all_hosts=nagiosadmin
authorized_for_all_service_commands=nagiosadmin
authorized_for_all_host_commands=nagiosadmin

Check through the $NAGIOSHOME/etc/nagios.cfg to see which are the best options for you with things like whether nagios allows external commands to be executed through the web interface, how often to rotate log files etc.

If you decide to make external commands accessible to nagios, then you make ensure that the directory $NAGIOSHOME/var/rw is readable and writeable by the web server user (usually 'www-data').

If you do want to allow external commands to be parsed and acted on by Nagios, you need to set the directive:

check_external_commands=1

in $NAGIOSHOME/etc/nagios.cfg Then we need a new user group and relevant permissions on $NAGIOSHOME/var/rw and $NAGIOSHOME/var/rw/nagios.cmd accordingly:-

#groupadd nagiocmd
#usermod -G nagiocmd nagios
#usermod -G nagiocmd www-data

where "www-data" is the apache user. Now make the command directory (if it does not already exist).

#mkdir $NAGIOSHOME/var/rw

and set the permissions

#chown nagios:nagiocmd $NAGIOSHOME/var/rw
#chmod u+rwx $NAGIOSHOME/var/rw
#chmod g+rwx $NAGIOSHOME/var/rw
#chmod g+s $NAGIOSHOME/var/rw

You'll need to restart apache so that it can take advantage of being part of the nagiocmd group.

Templating Configuration Files

With all of the object configuration files, you can use templates to make the files smaller and save you time and effort when you need to make changes to them. Let's take the example of the services definitions (see later for more explanation):-

# Generic service definition template
define service{
name generic-service ; The 'name' of this service template, referenced in other service definitions
active_checks_enabled 1 ; Active service checks are enabled
passive_checks_enabled 1 ; Passive service checks are enabled/accepted
parallelize_check 1 ; Active service checks should be parallelized (disabling this can lead to major performance problems)
obsess_over_service 1 ; We should obsess over this service (if necessary)
check_freshness 0 ; Default is to NOT check service 'freshness'
notifications_enabled 1 ; Service notifications are enabled
event_handler_enabled 1 ; Service event handler is enabled
flap_detection_enabled 1 ; Flap detection is enabled
process_perf_data 1 ; Process performance data
retain_status_information 1 ; Retain status information across program restarts
retain_nonstatus_information 1 ; Retain non-status information across program restarts
contact_groups $CONTACT_GROUP1
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
service_description $SERVICE

register 0 ; DONT REGISTER THIS DEFINITION - ITS NOT A REAL SERVICE, JUST A TEMPLATE!
}

# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
}

# Service definition
define service{
use generic-service
host_name $HOST4,$HOST5...
contact_groups $CONTACT_GROUP1,$CONTACTGROUP2
}

Any pretty common directives to the service checking can go into the template section at the top, then specify only the bits that would differ for specific (groups of) hosts in the service definition sections. Also, you can over-ride templated settings in the specific service definition sections.

Configure time periods (timeperiods.cfg)

You need to think about what time periods you would want to separate out the notifications and checking of services. e.g.

# '24x7' timeperiod definition
define timeperiod{
timeperiod_name 24x7
alias 24 Hours A Day, 7 Days A Week
sunday 00:00-24:00
monday 00:00-24:00
tuesday 00:00-24:00
wednesday 00:00-24:00
thursday 00:00-24:00
friday 00:00-24:00
saturday 00:00-24:00
}

# 'workhours' timeperiod definition
define timeperiod{
timeperiod_name workhours
alias "Normal" Working Hours
monday 08:00-18:00
tuesday 08:00-18:00
wednesday 08:00-18:00
thursday 08:00-18:00
friday 08:00-18:00
}

# 'nonworkhours' timeperiod definition
define timeperiod{
timeperiod_name nonworkhours
alias Non-Work Hours
sunday 00:00-24:00
monday 00:00-09:00,17:00-24:00
tuesday 00:00-09:00,17:00-24:00
wednesday 00:00-09:00,17:00-24:00
thursday 00:00-09:00,17:00-24:00
friday 00:00-09:00,17:00-24:00
saturday 00:00-24:00
}

# 'none' timeperiod definition
define timeperiod{
timeperiod_name none
alias No Time Is A Good Time
}

Notice that time period definitions are allowed to overlap.

For most purposes, the existing configuration is pretty good, though you may just want to tweak the "workhours" definitions (and thus the "nonworkhours" from 9am-5pm to your local requirements. This edit can be made in the $NAGIOSHOME/etc/timeperiods.cfg If you plan to make no changes from the supplied timeperiods.cfg-sample file, then just copy it to timeperiods.cfg and you're done.

Configure contacts (contacts.cfg)

Obviously, the point of monitoring is that the relevant people know when something isn't right. So, one thing we need to do is to set up a list of people who will be notified in the event of problems. e.g.:- Let's say we have 6 servers, 2 in London (LON1 and LON2), 2 in New York (NY1 and NY2) and 2 in Hong Kong (HK1 and HK2). Each location has one machine that is a gateway and firewall (machine 1) and the other machine is mail and webcache (machine 2) and the webserver runs on LON1. There are people in the company responsible for various services and hardware and there are those who would need to know in the event of an outage, for escalation purposes.

You will need one section per person. Let's take two people; Fred Bloggs (login ID fbloggs, email address fbloggs@bigcorp.com), who is the operations manager and needs to know 24x7x365 about problems and Joanna Smith (login ID jsmith, email address jsmith@bigcrop.com), who is a web architect and needs to know about critical problems with her web servers on weekdays, in working hours, but someone else covers at weekends and warnings aren't of interest.

# 'fbloggs' contact definition
define contact{
contact_name fbloggs
alias Fred Bloggs
service_notification_period 24x7
host_notification_period 24x7
service_notification_options w,u,c,r
host_notification_options d,u,r
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email fbloggs@bigcorp.com
}

# 'jsmith' contact definition
define contact{
contact_name jsmith
alias Joanna Smith
service_notification_period workinghours
host_notification_period workinghours
service_notification_options u,c
host_notification_options d,u
service_notification_commands notify-by-email
host_notification_commands host-notify-by-email
email jsmith@bigcorp.com
}

Configure contact groups (contactsgroup.cfg)

In our hypothetical company, we have various functional groups responsible for technical issues:-

Mail admins - Fred
New York admins - Fred, Joanna

... etc. and we can define these groups in the $NAGIOSHOME/etc/contactgroups.cfg file:-

# 'mail-admins' contact group definition
define contactgroup{
contactgroup_name mail-admins
alias Mail Admins
members fbloggs
}

# 'ny-admins' contact group definition
define contactgroup{
contactgroup_name ny-admins
alias New York Admins
members fbloggs,jsmith
}
...and so on.

Configure host groups (hostgroup.cfg)

Host groups are useful to separate different physical locations, functions and services. Hosts can be members of one or more groups. We could group them as follows:-

Hong Kong Group: HK1,HK2
New York Group: NY1,NY2
London Group: LON1,LON2,LON3
Mail Servers: HK2,NY2,LON2
Gateways: HK1,NY1,LON1
Firewalls: HK1,NY1,LON1
Webcaches: HK1,NY1,LON1
Webservers: LON3

So, in the view of host groups, there is a logical set-out by location and by function, making it easier to spot problems. We can specify the groups in the $NAGIOSHOME/etc/hostgroups.conf for this example like this:-

# 'hong-kong' host group definition
define hostgroup{
hostgroup_name hong-kong
alias Hong Kong Group
contact_groups hk-admins*
members HK1,HK2
}

# 'new-york' host group definition
define hostgroup{
hostgroup_name new-york
alias New York Group
contact_groups ny-admins*
members NY1,NY2
}

# 'london' host group definition
define hostgroup{
hostgroup_name london
alias London Group
contact_groups lon-admins*
members LON1,LON2,LON3
}

# 'mail' host group definition
define hostgroup{
hostgroup_name mail
alias Mail Servers
contact_groups mail-admins,hk-admins,ny-admins,lon-admins*
members HK2,NY2,LON2
}

# 'gateway' host group definition
define hostgroup{
hostgroup_name gateway
alias Gateway Servers
contact_groups infrastructure,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}

# 'firewall' host group definition
define hostgroup{
hostgroup_name firewall
alias Firewalls
contact_groups security,hk-admins,ny-admins,lon-admins*
members HK1,NY1,LON1
}

# 'cache' host group definition
define hostgroup{
hostgroup_name cache
alias Webcaches
contact_groups infrastructure*
members HK1,NY1,LON1
}

# 'www' host group definition
define hostgroup{
hostgroup_name www
alias Web Servers
contact_groups infrastructure, webbies*
members LON3
}
* - host groups do not take contact_groups as a directive in Nagios 2.0.

Configure hosts (hosts.cfg)

This is the part where you tell nagios which hosts you are interested in. In $NAGIOSHOME/etc/hosts.cfg you can specify the hosts by IP address, give them a label and set which check command to use for testing whether it is alive and finally, what time period you want to use for notifications. e.g. for our company's webserver, LON3, we reference the generic host definition given at the top of the hosts.cfg-sample file which we retain in hosts.cfg and specify specifics:-

# 'LON1' host definition
define host{
use generic-host

host_name LON3
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

Now, when it comes to the status map, where you will want to make the map look like the physical layout, you can use the "parents" parameter to specify which host is the parent to the one you are defining. For example, if you want the map to show LON1, LON2 and LON3 connected to a router "Route1" on the way to NY1 and NY2, you would specify that LON1, LON2, LON3, NY1 and NY2 have the parent "Route1" like this in the hosts.cfg:-

# 'LON3' host definition
define host{
use generic-host

host_name LON3
parents Route1
alias Solaris/Apache webserver
address 192.168.1.13
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

# 'LON2' host definition
define host{
use generic-host

host_name LON2
parents Route1
alias Solaris/Mail server
address 192.168.1.14
check_command check-host-alive
max_check_attempts 10
notification_interval 120
notification_period 24x7
notification_options d,u,r
}

Status Map

Also in the status map, you would probably like to have pretty icons for each of the hosts. Download and unpack imagepak-base.tar.gz(http://prdownloads.sourceforge.net/nagios/imagepak-base.tar.gz) and copy the contents to $NAGIOSHOME/share/images/logos Now, we need to tell nagios which icons to use for each host. In $NAGIOSHOME/etc/cgi.cfg you need to point to an external template file which will contain the definitions:-

xedtemplate_config_file=$NAGIOSHOME/etc/hostextinfo.cfg

and create that file, with the definitions for the hosts:-

define hostextinfo{
host_name LON2
2d_coords 40,40
icon_image sun40.png
icon_image_alt Solaris/Mail server
vrml_image sun40.png
statusmap_image sun40.gd2
}

where the *_image files are appropriately selected from those in $NAGIOSHOME/share/images/logos, though you must use a .gd2 file for the statusmap_image. The 2d_coords are where the icon should appear on the status map if you are using an option of the statusmap layout (set in $NAGIOSHOME/etc/cgi.cfg) that allows for specifying the location. It is a good idea to start out using the default layout 5 (Circular, Marked Up), which does not required co-ordinates to be set. You can modify the setting later (or not), when you have a better idea of where you want them placed.

Configure commands (commands.cfg)

This part is quite complex, so I've made the details a separate guide, here. However, basically what you need to do is to look in the $NAGIOSHOME/libexec directory to see what commands are there, check out the switches and flags (usually by running the command with a --help option) and configure the ones you want in $NAGIOSHOME/etc/checkcommands.cfg

Here is a basic example for the command to check whether a secure apache is running on a host:-

# 'check_apache' command definition
define command{
command_name check_apache
command_line $USER1$/check_https -H $HOSTADDRESS$
}
$USER1$ refers to a configuration in the $NAGIOSHOME/etc/resource.cfg file which usually (and in the frame of this installation guide) refers to the location of the executable checking commands/plugins. $HOSTADDRESS$ is the variable passed into the command denoting on which host that service should be checked.

Configure dependencies

Dependencies between services can be configured in $NAGIOSHOME/etc/dependencies.cfg For the moment, this will not be covered by this set of guidelines.

Configure escalations

Dependencies between services can be configured in $NAGIOSHOME/etc/escalations.cfg For the moment, this will not be covered by this set of guidelines.

Configure resources

The $NAGIOSHOME/etc/resource.cfg file is where some common variables and macros are defined. You can define up to 32 $USERx$ macros, which can in turn be used in command definitions in your host config file(s). $USERx$ macros are useful for storing sensitive information such as usernames, passwords, etc. They are also handy for specifying the path to plugins and event handlers - if you decide to move the plugins or event handlers to a different directory in the future, you can just update one or two $USERx$ macros, instead of modifying a lot of command definitions.

Most importantly, the CGIs will not attempt to read the contents of resource files, so you can set restrictive permissions (600 or 660) on them.

After installing nagios, the default resource.cfg-sample file is generally good enough to be used as resource.cfg, unless you have some fancy stuff to configure in.

nrpe Addon Configuration in Nagios

nrpe is the commonly used client application or agent that runs on the hosts to be monitored to gather local data which cannot (or is less logical to) be retrieved directly from the Nagios host.

Download a copy of nrpe-.tar.gz and untar somewhere sensible. Now build it:-

#./configure
#make all
#cp ./src/nrpe /usr/local/nagios
#cp ./src/check_nrpe /usr/local/nagios
#cp nrpe.cfg /usr/local/nagios

Add nrpe to the network services:-Edit /etc/services to add the following line:-

nrpe 5666/tcp # nrpe, nagios monitoring service

We have already installed the nagios plugins packages

Now Configure the checks:- Edit nrpe.cfg to configure locally and to add any checks to run on that host:-

allowed hosts=10.141.145.117command[check_data1]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /data1
command[check_data2]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /data2
command[check_mysql_5]=/usr/local/nagios/libexec/check_mysql_5 -H database.domain.uk -u nagios -p nagios -P 3309
command[check_mysql_4]=/usr/local/nagios/libexec/check_mysql_4 -H database.domain.uk -u nagios -p nagios -P 3306

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_home]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /home
command[check_root]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /
command[check_var]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /var
command[check_usr]=/usr/local/nagios/libexec/check_disk -w 10 -c 2 -p /usr

command[check_u01]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u01
command[check_u02]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u02
command[check_u03]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u03
command[check_u04]=/usr/local/nagios/libexec/check_disk -w 10 -c 5 -p /u04

(above are example checks, obviously) Check nrpe responds from your main Nagios host:-

#/usr/local/nagios/libexec/check_nrpe -H machine.domain.uk -c check_root
#/home/nagios/libexec/check_nrpe -H machine.domain.uk -c check_root

And add services to your main Nagios host services.cfg:-

# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description load
contact_groups engineers
check_command check_nrpe!check_load
}

# Service definition
define service{
use nrpe-service
host_name dbdev2
service_description /home
contact_groups engineers
check_command check_nrpe!check_home
}

...Then reload the nagios config on the Nagios host:-

#/etc/init.d/nagios reload

[* - if checking mysql, you might want to add a nagios user so you're not using real ones:-

grant select on test.* to nagios@'%' identified by 'nagios';
grant select on test.* to nagios@'dev8' identified by 'nagios';
grant select on test.* to nagios@'localhost' identified by 'nagios';]

Configure services (services.cfg)

This is a quite large part of the configuration. The basics are as follows.

In the file $NAGIOSHOME/etc/services.cfg, you need to specify which services are to be monitored for each host. This ranges from the basic ping to checking apache is running, SMTP is working etc. For each server, you must at least specify a ping service. The example I'll give is generic and based on the generic-service template which is supplied in the file services.cfg-sample (which must be included in services.cfg if you want to reference it).

# Service definition
define service{
use generic-service
host_name $HOST1,$HOST2,$HOST3...
service_description $SERVICE
is_volatile 0
check_period $PERIOD
max_check_attempts #n
normal_check_interval #n
retry_check_interval #n
contact_groups unix-admins
notification_interval #n
notification_period $PERIOD
notification_options w,u,c,r
check_command $COMMAND $ARGUMENTS
}

One thing to note... if you are probing the availability of machines/services which are not owned by you, it is probably best to set the normal_check_interval to a conservative time period, say 10 minutes. The interval_length is set in $NAGIOSHOME/etc/nagios.cfg, defaults to 60 (seconds). The check_interval is set in multiples of the normal_check_interval, so for 10 minutes, leave interval_length at the default and set normal_check_interval to 10.

Configure service groups (servicegroup.cfg only forNagios v2.0 or higher)

As with host groups, you can group services into logical clumps, specifying the host and service name for each service in the group:-

# 'Live Databases' service group definition
define servicegroup{
servicegroup_name live_db
alias Live Databases
members $HOST1,$SERVICE1,$HOST2,$SERVICE2,$HOST2,$SERVICE3,$HOST3,$SERVICE4,$HOST4,$SERVICE5
}

Service groups do not take contact_groups as a directive.

Configure mail alerts (misccommands.cfg)

This is specific to Solaris. The default setup of mail uses mail, which does not take -s under Solaris, so the subject lines of the alert emails will be blank. You need to use mailx. So, edit $NAGIOSHOME/etc/misccommands.cfg and find the lines:-

# 'notify-by-email' command definition
define command{
command_name notify-by-email
command_line /usr/bin/printf "%b" "***** Nagios 1.0 *****\n\nNotifica
tion Type: $NOTIFICATIONTYPE$\n\nService: $SERVICEDESC$\nHost: $HOSTALIAS$\nAddr
ess: $HOSTADDRESS$\nState: $SERVICESTATE$\n\nDate/Time: $DATETIME$\n\nAdditional
Info:\n\n$OUTPUT$" | /usr/bin/mail -s "** $NOTIFICATIONTYPE$ alert - $HOSTALIA
S$/$SERVICEDESC$ is $SERVICESTATE$ **" $CONTACTEMAIL$
}

and change mail to mailx. Also in this section, you can configure what will appear on the subject line. Basically, just modify the section in quotes after mailx -s, using relevant variables for what you want to see.

Troubleshooting Nagios Configuration

If you have problems with the status map, histograms etc., then you do need to make sure that your libraries are linked as follows:-

crle -l /usr/lib:/usr/local/lib:/usr/local/ssl/lib:/opt/sfw/lib

Remember, your system may be using libraries in other places in addition to these locations. Take care to include those if you need to.

Also, for problems with status map and histograms, check back to when you installed the GD, jpeg and png libraries. Did you install them in the correct order and did gd report jpeg and png support something like this:-

** Configuration summary for gd 2.0.33:

Support for PNG library: yes
Support for JPEG library: yes
Support for Freetype 2.x library: no
Support for Fontconfig library: no
Support for Xpm library: yes
Support for pthreads: yes

If not, you may need to re-visit your gd installation.

Start her up and see what happens

$NAGIOSHOME/bin/nagios start

Then point your browser at: http://yourserver/nagios/ and attempt to log in.


-----
http://www.debianhelp.co.uk/

No comments:

Post a Comment