Monitoring Screens

We all know that it is important to monitor your servers and services, so you can spot issues before they become problems. I personally have spent a lot of time configuring nagios to email me about issues and I have recently been configuring various different alerts in Azure.

My old boss has this idea that I should have a big monitor screen displaying all the vital stats of my servers and services, I personally disagree with this idea and think that notifications on my phone and email alerts are sufficient. He will no doubt correct my thinking when he reads this, but I believe part of his thinking is to make the monitoring of your infrastructure move visible and make it obvious to anyone that walks past that you have your eye on everything.

For the purpose of this blog post lets assume he has convinced me and I have convinced my actual boss to spend money on the required technology to do this (No easy feat). What exactly would I display on this screen?

I have Google Chromecast that I use for streaming various things to my TV, this is a relatively cheap bit of technology that could allow a TV or monitor to display a web page with the required stats displayed. perf

The two main sources of information that I want to display are New Relic for monitoring my azure websites and Nagios for monitoring my internal servers. New Relic allows you to easily export live performance data as iframes so I quickly threw together a web page full of these graphs. However if you have a static screen on the wall you don’t want to have to scroll to see different information so I needed to come up with another way to display this information.

My first thought was a slide show. There are lots of javascript scripts that cycle through a series of images like a slideshow, this could be adapted to cycle through a series of iframes and display everything I want.

My script goes something like this and requires jquery as well as javascript. First of all the script waits for the page to load completely with the ready function, it then defines the urls which will be put into the iframe one at a time. It than counts the number of urls you have. It then loops through changing the contents of the src attribute in the iframe every few seconds, in my example it changes every 9 seconds but once this is used in production you may want to increase this.

<script type="text/javascript">
$(document).ready(function(){
var locations = ["URL1", "URL2", "etc"];
var len = locations.length;
var iframe = $('#frame');
var i = 0;
setInterval(function () {
iframe.attr('src', locations[++i % len]);
}, 9000);
});
</script>

Now what information wants to be included in a script like this? Showing too much performance data can almost be as bad as not doing it at all as problems gets drowned out in the noise. For me I have performance of my websites, followed by Nagios problems, followed by the azure status page, followed by memory usage of all my servers and lastly showing number of connections to my databases. Another question to consider is what time scales do you want to graph over, too long and you don’t see what is happening now, but too short and you may only worry about an intermittent issue?

 

I love Nagios

Nagios

You may not have heard of Nagios but it has saved my bacon quite a few times.

Nagios is an open source server monitoring application that runs on many linux flavours.

I can’t remember exactly when I first installed nagios but I am guessing it was sometime in 2007/8. My boss gave me a book about it (which I never read) and told me to create a system to monitor the companies servers.

Nagios is not simple to set up. It relies on setting up various Hosts and services. Hosts are usually physical servers that you want to monitor and services are all the services you want to monitor. As this is a linux program all these can be configured by editing the right config file

Nagios is very flexible and can be expanded easily with the use of plugins, if you want to monitor something there is usually a plugin available. If you have a dell server running openmanage software there is even a plugin that allows the temperature of your server to be monitored.

If you want to monitor windows servers the use of nsclient++ is a real advantage. This is a simple client that runs as a service on your windows server. This allows nagios to track memory, cpu, disk space, performance and services, in fact almost everything that you would want to monitor.

Over the years I have kept a close eye on Nagios and added extra checks as new services were added or problems encountered. A few years ago I dabbled with sending alerts out via SMS message and once I got a smart phone found an app to keep track of Nagios 24/7.

But recently I have started wondering if Nagios is the best way to monitor modern servers like 2012 or remote services like Azure. I want something that is easy to expand as your IT infrastructure expands. Something that relies on running on a linux OS requires your IT staff have a knowledge of linux and you keep that server maintained and updated.

My question is: Is Nagios still the best way to monitor my servers?