10 Ways to Survive as an IT Manager

IT ManagerSo after five and a bit years of being an IT Manager here is some advice I have learned along the way in no particular order. On the whole I have enjoyed myself but it has been a real challenge at times.

1. Figure out what plates are still spinning

Being an IT Manager is all about keeping everything running all of the time. A bit like spinning 5 or 6 plates. You have plates for your servers and network infrastructure, you have plates for bespoke databases that you maintain, you have plates for your staff (including any external contractors), you have plates for any websites or apps that you develop. That is a lot of plates to keep spinning and that before you start thinking about what your boss wants you to deliver. Make sure you know what is happening with all these plates, which ones are happy, which ones are on the way to the floor and which ones you need to get the glue out and repair.

2. Make it someone else’s problem

If you can blame someone else do so. If your internet goes down it is your ISPs fault. If your website dies its your hosting company’s fault. Take responsibility for problems but if when something goes wrong you can pick up the phone and ask for help, it will make your life easier.

3. Hire good staff

Hiring poor staff wastes time and money and makes you look bad by others. Demand the highest salary band for new staff that you can afford and don’t agree to hiring anyone that you have doubts about. It is easy to bow to the pressure to get someone quickly but this will always result in worse problems in the long run. Once you have a good team do your best to keep them, and warn upper management of the problems if staff leaves (basically make it their problem not yours!).

4. Learn, Learn, Learn

You may or may not have the opportunity to go on training courses. Whatever your situation spend time learning new stuff that will benefit the company and yourself. You can learn a lot by reading online, you can petition for training from your managers, you can fund training yourself, you can ask for help from your different suppliers. The more you learn, the more you can do and the more useful you can be to the company, plus the more interesting you will find the job.

5. Say No!

Don’t be afraid to say no. You will always be asked to do the impossible and if something is impossible say so at the start. It wastes everyone’s time if you spend a lot of time trying to do the impossible. Always give your reasons for saying no, and if you always say no people will think you are unhelpful. A better way to say no is to come up with a better solution. No I can’t do it your way but here is a better solution.

6. Don’t give estimates

If you are asked how long something will take you don’t answer straight away or give an exaggerated estimate. Go away and spend some time thinking of everything that is involved before replying. There will always be something that you forgot to consider when first asked about it and looking at the different components will help plan out the work needed as well as provide an estimate.

7. Know what to tell your boss, and what not to

This is a hard one to get the balance right for. You need to tell your boss enough so that they appreciate all that you do, but too much and they will stop listening and accuse you of talking in technobabble. I have never got the balance right with this one. I have always aired on the side of not telling my boss enough, and hence they don’t realize that I saved the day on Sunday night as everything is working again on Monday. Do repeat yourself. If your server is running low on resources start asking for replacement hardware early, and increase the frequency and the panic in line with the problems it is causing.

8. Understand the problems of the business

bhrzpww6aehdx1wvrrug
Businesses need to make money. If the one you work for isn’t making enough money you will soon be looking for another. If you work for IT you will quickly start to see the problems of the business, think about what simple changes IT could make to improve things that would benefit the whole company. Some of your suggestions won’t go anywhere, but some may have a massive impact. I can think of a few changes that IT have spearheaded that I am very proud of, upgrading our internet connection, simplifying or automating processes and delivering new versions of software.

9. Ask for help

Don’t be afraid of asking for help. There are lots of places to look for help. Other departments could take more on, you could recruit extra help, you could hire external contractors. You can ask questions on support forums like ServerFault or StackOverflow, many software re-sellers or other suppliers are a good point of contact for questions about things they supply. Microsoft Support was also invaluable for a server issue.

10. Think about disasters

Write a disaster recovery plan or backup policy. Yes there will always be something more important that needs doing, but just stop for a moment to think how you would feel if everything died on your watch. The one thing you can rely on with technology is that it will fail at some point. A back of the envelope plan of action is better than no plan at all, even better is a detailed plan of what to do when each and every service you rely on fails. Plan additional services with an idea of adding extra redundancy. Always have multiple Domain Controllers, think about what data you could run from the Cloud. VMs could be replicated to the Cloud, and servers could be run from there.

SQL Transaction Log Backups

Like many DBAs I spend a lot of time maintaining my SQL Server backups.

From SQL Server I maintain both full backups and transaction log backups. I have often restored my full backups but until recently I have never restored a transaction log backup. All backup strategy’s are only as good as the last time you tested the restore process.

So what is a transaction log backup?

A transaction log backup contains all the transaction log records generated since the last full backup and is used to allow the database to be recovered to a specific point in time (usually the time right before a disaster strikes).  Since these are incremental, if you want to restore the database to a particular point in time, you need to have all the transaction log records necessary to replay database changes up to that point in time.

How to do the restore.

First right click on Databases in SQL Management Studio and select restore database. You should then get a screen similar to this.

restore1

In source click the … to allow you to select your backup files.

Now normally I have only ever selected one file here, the *.bak file. Instead select the *.bak and all the *.trn files as well. After SQL Server has chugged for a few minutes (time will depend on number of transaction files and server/disk speed etc) the restore plan section should fill up with files.

In the destination database box, type in the name of the database you want to restore. I recommend using a different name to avoid overwriting the original database, appending Test or a datetime to the name is what I usually do.

On my test server I need to untick the take tail-log backups option off the options screen before I can execute the restore.

Now you can either check the tick boxes in the restore plan section or (more fun) click the timeline button to select at what point in time you want to restore to.

restore2

You can either select the point in time with your mouse or specify the exact point in the time textbox. Alternatively you can just select the most recent point, probably the most likely option when disaster strikes.

Now that I have tried doing this on my test server I feel much more confident that when disaster does strike I can get things restored quickly and painlessly.

How often should you run transaction backups?

The answer to this question depends on how critical your data is. Until very recently I ran mine ever 15 minutes, I have increased this to every 5 minutes, but I have seen recommendations of running it every minute. The more critical your data the more often you should run them.

Backing up SQL databases to Azure

I recently read a blog post by Pinal Dave about how you can backup straight to Azure Storage. The procedure he described is only available for SQL Server 2014 or later.

I won’t go into detail of this method as Pinal describes it better than I can, but the basic of it requires setting up credentials and then running a backup command that includes the URL of the storage container on Azure.

Unfortunately I am running SQL Server 2005 so this process will not work for me but it did start me thinking of what ways there might be for me to use.

The next thing I tried was Microsoft SQL Server Backup to Microsoft Azure Tool. Unfortunately I did not get this tool to work correctly on my setup. However it sounds like a flexible tool that allows compression and encryption of your backup files. This tool redirects your backup files to your Azure Storage so even if I had got it to work correctly it would not have been an ideal solution as I want local copies of my backup files as well.

After this I started to look at powershell again. Following on from my recent success with powershell I know how to connect to my Azure account so all I needed to script was copying a file from my server to Azure.

Get-ChildItem *.bak -File -Recurse | Set-AzureStorageBlobContent -Container $DestContainer -Force

This command gets all the backup files in a directory (the recurse switch looks in sub directories as well) and then pipes them to the Set-AzureStorageBlobContent command. This command uploads them to the storage container defined in $DestContainer. I have added the Force switch so that it will replace any files on Azure which have the same name.

I have only been using this script for the last few days but so far it has been working well. Now if I completely loose all data from the office I can restore from any other location using the data saved on Azure. A great improvement to my disaster recovery policy.

Things to know before working on your database

I recently saw this blog post by Brent Ozar that I thought I might discuss.

Brent listed 13 questions to ask about a database before you start working with it. I am going to go through these 13 questions and expand on them based on my experiences.

  1. Is this database in production now?
    I think it should go without saying that the first thing you should find out is if your database is in production. If its not in production you can do what you like and no one will notice.
    I know what databases are in production and which aren’t where I work so I can answer this one.
  2. If this goes down, what apps go down with it?
    What apps are running on what databases is a good second question. I have at least one database which has multiple front end apps. At first glance you would think that that these two apps are not connected but they are, I need to be careful with both of these apps to make sure they don’t break each other.
    I know what apps run off what databases.
  3. When those apps go down, is there potential for loss of life or money?
    This is a difficult question so I will split it into two. Loss of life, my databases don’t control life support machines or nuclear weapons so my first instinct would be to say no. However it is not that straightforward, what if your app allowed contractors to know the location of dangers inside a property they were working in. Once your app goes down, they could have an accident due to lack of information.
    Loss of money, this one is more straightforward. Time is money in the business world so any time that your app is down and your employees are not able to work is a loss of money. If your database is linked to an eCommerce site, the loss of money could be extremely high.
    I know what affect downtime will have on my users and business.
  4. How sensitive are these apps to temporary slowdowns?
    Similar to the previous question, a slowdown can be as serious as downtime for some applications.
    Luckily most of my apps are internal only so are not seriously affected by slowness.
  5. When was the last successful backup?uf010206
    I manage the backup schedule for all my databases so I know exactly when each one was last backed up. When ever I do anything to a production database I will run a backup so I can roll back in case of problems. As part of developing changes I run my changes on a backup of the data. I can script all my changes and repeatedly run them against a backup until I am sure no problems will occur.
  6. When was the last successful restore test?
    More important than a backup is testing restoring your databases. If you can’t restore data then your backup is useless. I try to test restoring my backups at least weekly so I know that I can rely on my backups.
  7. Is everyone okay losing data back to the last successful backup?
    If disaster strikes you could loose all data between now and the time of your last backup. But all is not lost transactional backups can be scheduled throughout the day, in my case the most data we could loose is 15 minutes. This could be configured to be more or less frequent depending on your data. But remember the previous question and make sure you test a restore of your transactional backups, if you can’t restore from them you will be forced to restore from the last successful backup.
  8. When was the last successful clean corruption test?
    Corruption can be a killer if it is not found quickly. If you need to restore to the last backup before corruption occurred this could result in a significant amount of data loss. To check for corruption you need to run DBCC regularly.
  9. Do we have a development or staging environment where I can test my changes first?
    If the answer to this is no, then your next job is to setup a development or staging area. Having a development environment makes development a lot easier and I don’t think I could manage to do all the changes I have done recently without one.
  10. Is there any documentation for why the server was configured this way?
    I really wish we had more documentation about configurations as it would make finding out why thing were setup the way they are. So unfortunately the answer to this question is No.
  11. What changes am I not allowed to make?
    Depending on what your app does, where it is hosted, how quick the changes are needed and many other factors will all restrict what changes you can make. Historical decisions on the database can also affect what changes can be made, if the database has been structured in a certain way, it may be very difficult to restructure it in a more efficient way.
  12. Who can test that my changes fixed the problem?
    This is an interesting question, from experience the best people to speak to about problems with the database are the users. If they can show you how to reproduce a problem, you should be able to fix this problem, after that you can probably get them to verify that it has been fixed.
  13. Who can test that the apps still work as designed, and that my changes didn’t have unintended side effects?
    This is an extension of the previous question. The main users of your app should be your first port of call to find out if the app works as expected. However exploring side affects and undesired features is something that I would test as part of the development process. It has taken a while but I have constructed a detailed check list that can be used for testing so I know that most bugs can be found before release.

Runaway SQL Log growth

Today is my day off, but I wake up and have a quick look at nagios to see if there is anything I need to worry about. Yes there is, SQL Server has run out of disk space on its data disk.

I race downstairs and VPN onto the server to find out what has happened. One of my monitoring databases has had runaway log growth and is over 80Gb is size.

BACKUP LOG [DBName] WITH TRUNCATE_ONLY
DBCC SHRINKFILE(‘DBName_Log’)

Free disk space is back to normal, all users will be unaware of the problem and everything is fine again. I create a daily job that runs the above code, that way it should stay a manageable size.

Next I need to find out why it happened and to prevent it happening again in the future (Next time I have a day off I want to lie in!)

I check the SQL logs and notice

BACKUP LOG WITH TRUNCATE_ONLY or WITH NO_LOG is deprecated. The simple recovery model should be used to automatically truncate the transaction log.

Then I remember what I have done to cause this issue. I have a separate disk for my backup files and earlier in the week I noticed this disk was filling up, a large amount of space was taken up by transactional backup files. I thought I don’t need to backup the transactions for this non critical database, I will just do a full backup at the start of everyday.

However what I forgot is that a transactional backup keeps the log file under control, once this backup was stopped the log file grew uncontrollably. The answer, change the database from FULL mode to SIMPLE.image002

This is my understanding of how backups work in FULL mode. A full backup is done at the start of the day which resets the log file, then changes in the database are stored in the log file, this is backed up into a transactional backup and the log file gets reset. If you have regular transactional backups throughout the day the log file doesn’t grow too much, however with no transactional backups your log file contains an entire days worth of changes and so for a monitoring database this could be quite large.

In SIMPLE mode you can’t do transactional backups and the log doesn’t grow uncontrollably. This shouldn’t be used for production databases as if there is a problem you could loose data.