Nagios Core
Nagios is an open source monitoring system for computer systems. It was designed to run on the Linux operating system and can monitor devices running Linux, Windows and Unix operating systems (OSes).Basically this utility runs periodically to check critical services which are running on different platforms. For example, Nagios can monitor memory usage, disk usage, currently running processes, check if any process consumed huge amount of memory etc. It can also monitory SMTP, POP, HTTP and some other common network protocol.
Nagios come up with command line interface as well as graphical user interface. But many people uses CLI as it is bit easier to operate and manage things quickly.
There is good plugin i.e. NRPE which allows you to remotely execute Nagios plugin on other Linux/Unix machine. You needs to have ssh connection between Nagios server and it's clients. You can execute scripts and check metrics on remote windows machine as well.
Installation of Nagios Core (4.1.1)
Pre-requisite:
1) Needs to have CentOS Linux release 7.2.1511 (Core) 64 Bit O.S
2) yum install -y httpd php
3) yum install -y gcc glibc glibc-common make gd gd-devel net-snmp
4) yum install perl perl-devel
5) yum install unzip
6) yum install wget
7) yum install openssl-devel
8) Needs to have one user `useradd nagios`
9) Needs to have one group `groupadd nagcmd`
10) Run command `usermod -G nagcmd nagios`
11) Run command `usermod -G nagcmd apache`
11) Create directory /root/nagios
12) Untar above downloaded files..
13) cd /root/nagios/nagios-4.1.1
14) Run command: ./configure -with-command-group=nagcmd
15) Run command: make all
16) Run command: make install
Now we need to install init scripts, the required files to run Nagios from the command line and the sample configuration files with the following command.
17) Run commands: make install-init
make install-commandmode
make install-config
After installation you will find all Nagios object copied under the following location
"/usr/local/nagios/etc/objects"
This is how we have installed Nagios utility to use via CLI, now we need install it's web component by running the following command..
`make install-webconf`
And we need to setup password for the user "nagiosadmin". This username will be used to access interface. So it's important to remember the password that you will type here. Set the password and run the following command and enter the password twice..
It's time to install Nagios plugins by running following command...
1) cd /nagios/nagios-plugins-2.0.3
2) Run command: `./configure --with-nagios-user=nagios --with-nagios-group=nagios
3) make
4) make install
5) Next we have to start Nagios at boot time, so first verify that the configuration file has no errors by running following command
`/usr/local/nagios/bin/nagios -v /usr/local/nagio/etc/nagios.cfg
6) If everything is fine then add the service to run at boot time with this command..
`chkconfig --add nagios`
`chkconfig --level 35 nagios on`
And start the service with following command..
`systemctl start nagios.service`
Installation of Nagios NRPE
Now you can access web portal of Nagios core with beautiful interface
This is all about installation of Nagios utility.
Now we will go over the various components of Nagios core.
Modules are like,
1) Hosts
2) Services
3) Contacts
4) Timeperiods
5) Templates
6) Command
7) Nagios Master
Following is the basic diagram of Nagios work flow where in three important modules are being used..
--> Nagios.cfg
--> Localhost.cfg
--> Commands.cfg & Plugins
Going through the above representation...
Nagios.cfg: This is the place where we usually mention the hosts file name for which Nagios is going to monitor..
Once Nagios understand the specific host needs to be monitor then it start looking for that particular host and check configuration which is usually resides under localhos.cfg ( If you have some remote host let say sip123 then you will have file name as sip123.cfg)
Once host found then it start looking for the services which is up and running on that particular host.
When Nagios check for that particular service then it pass that service name into the command which is usually run from command.cfg file.
That command contains plugin name e.g. check_by_ssh, host name, IP address of that particular host , service name ( service name means the script which resides on the remote host to check specific service) and timeout (in case ssh takes time to connect to that particular remote host)
Once the command run from Nagios server, it will return back with the appropriate result along with status and description of that particular service.
Now let's go over the configuration files..
--> Nagios.cfg
# You can specify individual object config files as shown below:
cfg_file=@sysconfdir@/objects/commands.cfg
cfg_file=@sysconfdir@/objects/contacts.cfg
cfg_file=@sysconfdir@/objects/timeperiods.cfg
cfg_file=@sysconfdir@/objects/templates.cfg
# Definitions for monitoring the local (Linux) host
cfg_file=@sysconfdir@/objects/localhost.cfg
# Definitions for monitoring a Windows machine
#cfg_file=@sysconfdir@/objects/windows.cfg
# Definitions for monitoring a router/switch
#cfg_file=@sysconfdir@/objects/switch.cfg
# Definitions for monitoring a network printer
#cfg_file=@sysconfdir@/objects/printer.cfg
--> localhost.cfg
define host{
use linux-server
host_name localhost
alias localhost
address 127.0.0.1
}
define service{
use local-service
host_name localhost
service_description Root Partition
check_command check_local_disk!20%!10%!/
}
--> commands.cfg
# 'check_local_disk' command definition
define command{
command_name check_local_disk
command_line $USER1$/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
command_name check_web_server
command_line $USER1$/check_by_ssh -H $HOSTADDRESS$ -l op5mon -i /opt/monitor/.ssh/id_rsa_check_by_ssh -C "/usr/lib64/nagios/plugins/check_web_server.sh -E -t 30
}
Now this check_web_server.sh file is resides on your remote server where this service is being monitored.
One needs to write service as per the requirement. When we write service then we need to write in Nagios format having specific return code as below...
Let say I want to monitor service for web server whether it is up and running or not.
--------------------------------------------------------------
#!/bin/ksh
#
# Nagios plugin for checking tomcat running
#
ARC_NAGIOS_RETURN_OK=0
ARC_NAGIOS_RETURN_WARNING=1
ARC_NAGIOS_RETURN_CRITICAL=2
sysName=`uname -n`
arc_check_for_tomcatRunning(){
arc_rc=$ARC_NAGIOS_RETURN_OK
arc_reason="Tomcat is running on $sysName."
#Check to make sure Tomcat is running
tel=`ps -ef |grep tomcat |grep -v "grep" |wc -l`
if [ $tel -eq 0 ]
then
arc_rc=$ARC_NAGIOS_RETURN_CRITICAL
arc_reason="Tomcat is not running on $sysName."
fi
echo $arc_reason
return $arc_rc
#End checking for tomcat running
}
arc_check_for_tomcatRunning
exit $?
-------------------------------
-------------------------------
Few supporting files are below..
--> template.cfg
This file come up with Nagios utility by default. This contains values which is going to apply while running the Nagios checks...this file is common to all hosts and services you configure in Nagios.
One can have customize configuration in localhost.cfg file itself and rest of the values take from template.cfg file.
define service{
name generic-service
active_checks_enabled 1
passive_checks_enabled 1
parallelize_check 1
obsess_over_service 1
check_freshness 0
notifications_enabled 1
event_handler_enabled 1
flap_detection_enabled 1
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 10
retry_check_interval 2
contact_groups admins
notification_options w,u,c,r
notification_interval 60
notification_period 24x7
register 0
}
--> contacts.cfg
This config file provides you with some example contact and contact group definitions that you can reference in host and service definitions as well as in template.cfg file.
===============================
Contact
===============================
define contact(){
contact_name nagioadmin
use generic-contact
alias Nagios admin
email nagios_user@localhost
}
===============================
Contact Group
===============================
define contactgroup(){
contact_name admins
alias Nagios Administrators
member nagiosadmin
}
--> timeperiod.cfg
You can configure time period to run particular service as per your need.. It's like crontab in Linux/Unix and task scheduler in Windows machine.
You can reference in host, service, contact, template and dependency definitions.
# 'workhours' 0 definition
define timeperiod{
timeperiod_name workhours
alias Normal Work Hours
monday 09:00-17:00
tuesday 09:00-17:00
wednesday 09:00-17:00
thursday 09:00-17:00
friday 09:00-17:00
}
Likewise you can configure timeperiod as per your requirement. Please google it for more details about this feature.
After all above appropriate configurations, restart the Nagios service using following command
`service nagios stop`
`service nagios start`
And hit following url...
http://<IP address of server where Nagios is installed>/nagios
Username: nagiosadmin
Password: <You set while installation>
You will see status of all hosts and services on page at one place.
*************************************END****************************************