Sunday, 18 November 2018

Nagios Monitoring

Nagios Core

Nagios is an open source monitoring system for computer systems. It was designed to run on the Linux operating system and can monitor devices running Linux, Windows and Unix operating systems (OSes).

Basically this utility runs periodically to check critical services which are running on different platforms. For example, Nagios can monitor memory usage, disk usage, currently running processes, check if any process consumed huge amount of memory etc. It can also monitory SMTP, POP, HTTP and some other common network protocol.

Nagios come up with command line interface as well as graphical user interface. But many people uses CLI as it is bit easier to operate and manage things quickly. 
There is good plugin i.e. NRPE which allows you to remotely execute Nagios plugin on other Linux/Unix machine. You needs to have ssh connection between Nagios server and it's clients. You can execute scripts and check metrics on remote windows machine as well.


Installation of Nagios Core (4.1.1)

Pre-requisite: 

1) Needs to have CentOS Linux release 7.2.1511 (Core) 64 Bit O.S
2) yum install -y httpd php
3) yum install -y gcc glibc glibc-common make gd gd-devel net-snmp
4) yum install perl perl-devel
5) yum install unzip
6) yum install wget
7) yum install openssl-devel
8) Needs to have one user `useradd nagios`
9) Needs to have one group `groupadd nagcmd`
10) Run command `usermod -G nagcmd nagios`
11) Run command `usermod -G nagcmd apache`


11) Create directory /root/nagios
12) Untar above downloaded files..
13) cd /root/nagios/nagios-4.1.1
14) Run command: ./configure -with-command-group=nagcmd

15) Run command: make all
16) Run command: make install

Now we need to install init scripts, the required files to run Nagios from the command line and the sample configuration files with the following command.

17) Run commands: make install-init
                                 make install-commandmode
                                 make install-config

After installation you will find all Nagios object copied under the following location 
"/usr/local/nagios/etc/objects"

This is how we have installed Nagios utility to use via CLI, now we need install it's web component by running the following command..
`make install-webconf`

And we need to setup password for the user "nagiosadmin". This username will be used to access interface. So it's important to remember the password that you will type here. Set the password and run the following command and enter the password twice..



It's time to install Nagios plugins by running following command...

1) cd /nagios/nagios-plugins-2.0.3
2) Run command: `./configure --with-nagios-user=nagios --with-nagios-group=nagios
3) make
4) make install

5) Next we have to start Nagios at boot time, so first verify that the configuration file has no errors by running following command
`/usr/local/nagios/bin/nagios -v /usr/local/nagio/etc/nagios.cfg

6) If everything is fine then add the service to run at boot time with this command..
`chkconfig --add nagios`
`chkconfig --level 35 nagios on`
And start the service with following command..
`systemctl start nagios.service`

Installation of Nagios NRPE


Now you can access web portal of Nagios core with beautiful interface
This is all about installation of Nagios utility.  

Now we will go over the various components of Nagios core. 

Modules are like,
1) Hosts
2) Services
3) Contacts
4) Timeperiods
5) Templates
6) Command
7) Nagios Master

Following is the basic diagram of Nagios work flow where in three important modules are being used..
--> Nagios.cfg
--> Localhost.cfg
--> Commands.cfg & Plugins


Going through the above representation...
Nagios.cfg: This is the place where we usually mention the hosts file name for which Nagios is going to monitor..
Once Nagios understand the specific host needs to be monitor then it start looking for that particular host and check configuration which is usually resides under localhos.cfg ( If you have some remote host let say sip123 then you will have file name as sip123.cfg)
Once host found then it start looking for the services which is up and running on that particular host. 
When Nagios check for that particular service then it pass that service name into the command which is usually run from command.cfg file.
That command contains plugin name e.g. check_by_ssh, host name, IP address of that particular host , service name ( service name means the script which resides on the remote host to check specific service) and timeout (in case ssh takes time to connect to that particular remote host)
Once the command run from Nagios server, it will return back with the appropriate result along with status and description of that particular service.

Now let's go over the configuration files..

--> Nagios.cfg 

# You can specify individual object config files as shown below:
cfg_file=@sysconfdir@/objects/commands.cfg
cfg_file=@sysconfdir@/objects/contacts.cfg
cfg_file=@sysconfdir@/objects/timeperiods.cfg
cfg_file=@sysconfdir@/objects/templates.cfg

# Definitions for monitoring the local (Linux) host
cfg_file=@sysconfdir@/objects/localhost.cfg

# Definitions for monitoring a Windows machine
#cfg_file=@sysconfdir@/objects/windows.cfg

# Definitions for monitoring a router/switch
#cfg_file=@sysconfdir@/objects/switch.cfg

# Definitions for monitoring a network printer
#cfg_file=@sysconfdir@/objects/printer.cfg

--> localhost.cfg

define host{
        
use                    linux-server 
host_name         localhost
alias                  localhost
address             127.0.0.1
}


define service{
        
use                             local-service         

host_name                  localhost
service_description      Root Partition

check_command  check_local_disk!20%!10%!/

}
--> commands.cfg

# 'check_local_disk' command definition
define command{

command_name    check_local_disk
command_line      $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$

command_name    check_web_server
command_line       $USER1$/check_by_ssh -H $HOSTADDRESS$ -l op5mon -i                                /opt/monitor/.ssh/id_rsa_check_by_ssh -C                                                     "/usr/lib64/nagios/plugins/check_web_server.sh -E -t 30

}

Now this check_web_server.sh file is resides on your remote server where this service is being monitored. 
One needs to write service as per the requirement. When we write service then we need to write in Nagios format having specific return code as below...
Let say I want to monitor service for web server whether it is up and running or not.

--------------------------------------------------------------

#!/bin/ksh 
#
# Nagios plugin for checking tomcat running 
#
ARC_NAGIOS_RETURN_OK=0
ARC_NAGIOS_RETURN_WARNING=1
ARC_NAGIOS_RETURN_CRITICAL=2

sysName=`uname -n`

arc_check_for_tomcatRunning(){

   arc_rc=$ARC_NAGIOS_RETURN_OK
   arc_reason="Tomcat is running on $sysName."

   #Check to make sure Tomcat is running
   tel=`ps -ef |grep tomcat |grep -v "grep" |wc -l`
   if [ $tel -eq 0 ]
   then
arc_rc=$ARC_NAGIOS_RETURN_CRITICAL
arc_reason="Tomcat is not running on $sysName."
   fi

   echo $arc_reason
   return $arc_rc

   #End checking for tomcat running
}

arc_check_for_tomcatRunning
exit $?
-------------------------------
-------------------------------

Few supporting files are below..

--> template.cfg

This file come up with Nagios utility by default. This contains values which is going to apply while running the Nagios checks...this file is common to all hosts and services you configure in Nagios.
One can have customize configuration in localhost.cfg file itself and rest of the values take from template.cfg file.

define service{
        
name                                        generic-service
active_checks_enabled              1       
passive_checks_enabled            1   
parallelize_check                       1       
obsess_over_service                 1       
check_freshness                       0       
notifications_enabled                1       
event_handler_enabled             1       
flap_detection_enabled             1       
process_perf_data                    1        
retain_status_information         1       
retain_nonstatus_information   1       
is_volatile                               0       
check_period                          24x7
max_check_attempts              3
normal_check_interval           10
retry_check_interval               2
contact_groups                      admins
notification_options     w,u,c,r
notification_interval               60
notification_period                 24x7
register                                 0       
}

--> contacts.cfg

This config file provides you with some example contact and contact group definitions that you can reference in host and service definitions as well as in template.cfg file.

===============================
Contact
===============================

define contact(){

                    contact_name          nagioadmin
                    use                         generic-contact
                    alias                        Nagios admin
                    email                      nagios_user@localhost
}

===============================
Contact Group
===============================

define contactgroup(){

                    contact_name          admins
                    alias                        Nagios Administrators
                    member                       nagiosadmin
}

--> timeperiod.cfg

You can configure time period to run particular service as per your need.. It's like crontab in Linux/Unix and task scheduler in Windows machine. 
You can reference in host, service, contact, template and dependency definitions.

# 'workhours' 0 definition
define timeperiod{
timeperiod_name workhours
alias                 Normal Work Hours
monday         09:00-17:00
tuesday         09:00-17:00
wednesday         09:00-17:00
thursday                 09:00-17:00
friday         09:00-17:00
}               
 Likewise you can configure timeperiod as per your requirement. Please google it for more details about this feature.

After all above appropriate configurations, restart the Nagios service using following command
`service nagios stop`
`service nagios start`

And hit following url...
http://<IP address of server where Nagios is installed>/nagios
Username: nagiosadmin
Password: <You set while installation>

You will see status of all hosts and services on page at one place.



*************************************END****************************************






No comments:

Post a Comment

Select Language