10 Ways to Survive as an IT Manager

IT ManagerSo after five and a bit years of being an IT Manager here is some advice I have learned along the way in no particular order. On the whole I have enjoyed myself but it has been a real challenge at times.

1. Figure out what plates are still spinning

Being an IT Manager is all about keeping everything running all of the time. A bit like spinning 5 or 6 plates. You have plates for your servers and network infrastructure, you have plates for bespoke databases that you maintain, you have plates for your staff (including any external contractors), you have plates for any websites or apps that you develop. That is a lot of plates to keep spinning and that before you start thinking about what your boss wants you to deliver. Make sure you know what is happening with all these plates, which ones are happy, which ones are on the way to the floor and which ones you need to get the glue out and repair.

2. Make it someone else’s problem

If you can blame someone else do so. If your internet goes down it is your ISPs fault. If your website dies its your hosting company’s fault. Take responsibility for problems but if when something goes wrong you can pick up the phone and ask for help, it will make your life easier.

3. Hire good staff

Hiring poor staff wastes time and money and makes you look bad by others. Demand the highest salary band for new staff that you can afford and don’t agree to hiring anyone that you have doubts about. It is easy to bow to the pressure to get someone quickly but this will always result in worse problems in the long run. Once you have a good team do your best to keep them, and warn upper management of the problems if staff leaves (basically make it their problem not yours!).

4. Learn, Learn, Learn

You may or may not have the opportunity to go on training courses. Whatever your situation spend time learning new stuff that will benefit the company and yourself. You can learn a lot by reading online, you can petition for training from your managers, you can fund training yourself, you can ask for help from your different suppliers. The more you learn, the more you can do and the more useful you can be to the company, plus the more interesting you will find the job.

5. Say No!

Don’t be afraid to say no. You will always be asked to do the impossible and if something is impossible say so at the start. It wastes everyone’s time if you spend a lot of time trying to do the impossible. Always give your reasons for saying no, and if you always say no people will think you are unhelpful. A better way to say no is to come up with a better solution. No I can’t do it your way but here is a better solution.

6. Don’t give estimates

If you are asked how long something will take you don’t answer straight away or give an exaggerated estimate. Go away and spend some time thinking of everything that is involved before replying. There will always be something that you forgot to consider when first asked about it and looking at the different components will help plan out the work needed as well as provide an estimate.

7. Know what to tell your boss, and what not to

This is a hard one to get the balance right for. You need to tell your boss enough so that they appreciate all that you do, but too much and they will stop listening and accuse you of talking in technobabble. I have never got the balance right with this one. I have always aired on the side of not telling my boss enough, and hence they don’t realize that I saved the day on Sunday night as everything is working again on Monday. Do repeat yourself. If your server is running low on resources start asking for replacement hardware early, and increase the frequency and the panic in line with the problems it is causing.

8. Understand the problems of the business

Businesses need to make money. If the one you work for isn’t making enough money you will soon be looking for another. If you work for IT you will quickly start to see the problems of the business, think about what simple changes IT could make to improve things that would benefit the whole company. Some of your suggestions won’t go anywhere, but some may have a massive impact. I can think of a few changes that IT have spearheaded that I am very proud of, upgrading our internet connection, simplifying or automating processes and delivering new versions of software.

9. Ask for help

Don’t be afraid of asking for help. There are lots of places to look for help. Other departments could take more on, you could recruit extra help, you could hire external contractors. You can ask questions on support forums like ServerFault or StackOverflow, many software re-sellers or other suppliers are a good point of contact for questions about things they supply. Microsoft Support was also invaluable for a server issue.

10. Think about disasters

Write a disaster recovery plan or backup policy. Yes there will always be something more important that needs doing, but just stop for a moment to think how you would feel if everything died on your watch. The one thing you can rely on with technology is that it will fail at some point. A back of the envelope plan of action is better than no plan at all, even better is a detailed plan of what to do when each and every service you rely on fails. Plan additional services with an idea of adding extra redundancy. Always have multiple Domain Controllers, think about what data you could run from the Cloud. VMs could be replicated to the Cloud, and servers could be run from there.

Monitoring Screens

We all know that it is important to monitor your servers and services, so you can spot issues before they become problems. I personally have spent a lot of time configuring nagios to email me about issues and I have recently been configuring various different alerts in Azure.

My old boss has this idea that I should have a big monitor screen displaying all the vital stats of my servers and services, I personally disagree with this idea and think that notifications on my phone and email alerts are sufficient. He will no doubt correct my thinking when he reads this, but I believe part of his thinking is to make the monitoring of your infrastructure move visible and make it obvious to anyone that walks past that you have your eye on everything.

For the purpose of this blog post lets assume he has convinced me and I have convinced my actual boss to spend money on the required technology to do this (No easy feat). What exactly would I display on this screen?

I have Google Chromecast that I use for streaming various things to my TV, this is a relatively cheap bit of technology that could allow a TV or monitor to display a web page with the required stats displayed. perf

The two main sources of information that I want to display are New Relic for monitoring my azure websites and Nagios for monitoring my internal servers. New Relic allows you to easily export live performance data as iframes so I quickly threw together a web page full of these graphs. However if you have a static screen on the wall you don’t want to have to scroll to see different information so I needed to come up with another way to display this information.

My first thought was a slide show. There are lots of javascript scripts that cycle through a series of images like a slideshow, this could be adapted to cycle through a series of iframes and display everything I want.

My script goes something like this and requires jquery as well as javascript. First of all the script waits for the page to load completely with the ready function, it then defines the urls which will be put into the iframe one at a time. It than counts the number of urls you have. It then loops through changing the contents of the src attribute in the iframe every few seconds, in my example it changes every 9 seconds but once this is used in production you may want to increase this.

<script type="text/javascript">
var locations = ["URL1", "URL2", "etc"];
var len = locations.length;
var iframe = $('#frame');
var i = 0;
setInterval(function () {
iframe.attr('src', locations[++i % len]);
}, 9000);

Now what information wants to be included in a script like this? Showing too much performance data can almost be as bad as not doing it at all as problems gets drowned out in the noise. For me I have performance of my websites, followed by Nagios problems, followed by the azure status page, followed by memory usage of all my servers and lastly showing number of connections to my databases. Another question to consider is what time scales do you want to graph over, too long and you don’t see what is happening now, but too short and you may only worry about an intermittent issue?


I love Nagios


You may not have heard of Nagios but it has saved my bacon quite a few times.

Nagios is an open source server monitoring application that runs on many linux flavours.

I can’t remember exactly when I first installed nagios but I am guessing it was sometime in 2007/8. My boss gave me a book about it (which I never read) and told me to create a system to monitor the companies servers.

Nagios is not simple to set up. It relies on setting up various Hosts and services. Hosts are usually physical servers that you want to monitor and services are all the services you want to monitor. As this is a linux program all these can be configured by editing the right config file

Nagios is very flexible and can be expanded easily with the use of plugins, if you want to monitor something there is usually a plugin available. If you have a dell server running openmanage software there is even a plugin that allows the temperature of your server to be monitored.

If you want to monitor windows servers the use of nsclient++ is a real advantage. This is a simple client that runs as a service on your windows server. This allows nagios to track memory, cpu, disk space, performance and services, in fact almost everything that you would want to monitor.

Over the years I have kept a close eye on Nagios and added extra checks as new services were added or problems encountered. A few years ago I dabbled with sending alerts out via SMS message and once I got a smart phone found an app to keep track of Nagios 24/7.

But recently I have started wondering if Nagios is the best way to monitor modern servers like 2012 or remote services like Azure. I want something that is easy to expand as your IT infrastructure expands. Something that relies on running on a linux OS requires your IT staff have a knowledge of linux and you keep that server maintained and updated.

My question is: Is Nagios still the best way to monitor my servers?