Learning R

Today I spent some time learning the R language.

The problem I was trying to solve was to convert local prices of some items into Euros. I had been using a fixed exchange rate for all data, but as exchange rates fluctuate so much this is incorrect.

My first though was to find a free API that I could query to get the values I wanted. The first API I found didn’t cover all the currencies, the next one I found I burnt through the free allowance in one pass.

A colleague of mine mentioned using R to solve this, he sent me some links and I set out to write my first piece of R code.

My finished code can be found on github and I will attempt to explain some of it.

R defines functions fairly simply

nameoffunction <- function(arg1, arg2)
{
arg1 * arg2
}

I have created a function that takes 2 parameters date and currency. I know I have about 10 different currencies that I want to get currencies for and I want to loop through each day so I will need to pass in a date.

The source of my exchange rate information is the www.xe.com website, its historical exchange rate page passes currency and date into the query string so I should be able to build up a string containing all the different elements.

All programming language can concatenate strings and R is no different. It uses paste()

var <- paste(“Hello”, “World”)

However R has an annoying feature in this function. I would expect that var in the above example would contain “HelloWorld”, it doesn’t it contains “Hello World”. Why it automatically adds a space I don’t know?

var <- paste(“Hello”, “World”, sep=””)

I am not entirely sure what all of the code does but I can take a good guess.

read_html() I would guess loads a html page, html_nodes() finds all the html tags of a certain type on the page, in my case <table>, html_table() reads the first table it finds.

table1[2] selects the second column, and head() selects a specific number of rows. I want the first row and second column so I combine these two as head(table1[2],1)

Now that I have found my exchange rate what do I do with it? R can read and write to SQL Server so why not store this info in a SQL lookup table. I can then use this data in a stored procedure when I process my data.

To query sql you can use sqlQuery(), it has two parameters, a sql connection and a TSQL command (eg a SELECT, INSERT, UPDATE statement)

I use a while loop to loop through every day between 1st October 2016 and today and look up the exchange rate for each currency.

For now I am manually running this R script, however there are ways to run R directly from SQL Server which I may well investigate. I could then have a SQL job to run this on a schedule, maybe once a day to get the latest exchange rates. I also would like to do something a bit cleverer like only getting exchange rates for the days that I need them by querying existing database tables.

SQL Transaction Log Backups

Like many DBAs I spend a lot of time maintaining my SQL Server backups.

From SQL Server I maintain both full backups and transaction log backups. I have often restored my full backups but until recently I have never restored a transaction log backup. All backup strategy’s are only as good as the last time you tested the restore process.

So what is a transaction log backup?

A transaction log backup contains all the transaction log records generated since the last full backup and is used to allow the database to be recovered to a specific point in time (usually the time right before a disaster strikes).  Since these are incremental, if you want to restore the database to a particular point in time, you need to have all the transaction log records necessary to replay database changes up to that point in time.

How to do the restore.

First right click on Databases in SQL Management Studio and select restore database. You should then get a screen similar to this.

restore1

In source click the … to allow you to select your backup files.

Now normally I have only ever selected one file here, the *.bak file. Instead select the *.bak and all the *.trn files as well. After SQL Server has chugged for a few minutes (time will depend on number of transaction files and server/disk speed etc) the restore plan section should fill up with files.

In the destination database box, type in the name of the database you want to restore. I recommend using a different name to avoid overwriting the original database, appending Test or a datetime to the name is what I usually do.

On my test server I need to untick the take tail-log backups option off the options screen before I can execute the restore.

Now you can either check the tick boxes in the restore plan section or (more fun) click the timeline button to select at what point in time you want to restore to.

restore2

You can either select the point in time with your mouse or specify the exact point in the time textbox. Alternatively you can just select the most recent point, probably the most likely option when disaster strikes.

Now that I have tried doing this on my test server I feel much more confident that when disaster does strike I can get things restored quickly and painlessly.

How often should you run transaction backups?

The answer to this question depends on how critical your data is. Until very recently I ran mine ever 15 minutes, I have increased this to every 5 minutes, but I have seen recommendations of running it every minute. The more critical your data the more often you should run them.

SQL Server Management Studio

I use SQL Server Management Studio all the time for writing queries, getting data and general SQL development.

SQL Server Management Studio

I have enjoyed seeing the improvements that each new version of SQL Server Management Studio (SSMS) introduced. One great improvement was intellisense.

This feature saves typing and reduces errors by automatically suggesting tables, column names or other database objects.

A common query that I get asked to write is provide a spreadsheet that gives the information that satisfies certain criteria. This is easy to do in SSMS, you can write the query, click execute and the rows that satisfy the criteria are displayed. These rows can then be easily copy/pasted into Excel or other spreadsheets.

A common data item that gets stored in databases is addresses and addresses often contain line breaks to make the data display better. In the earlier versions of SSMS, when you copied and pasted these line breaks were ignored and the data displayed the same in SSMS as it did in Excel. However in the more recent version, theses line breaks got copied across breaking your spreadsheet and making it hard to see what data corresponded with what.

Now I don’t know if this should be described as introducing a bug or fixing one. I can easily argue both sides. If your data contains a line break and you copy this data it should include the line break in the destination, but if it does that it displays badly in Excel.

The fix I have been using until recently is to use the following TSQL command in my queries.

SELECT REPLACE(REPLACE(@str, CHAR(13), ”), CHAR(10), ”)

This command replaces any line breaks with an empty string. Both Char(10) and Char(13) are needed because you can have different types of line breaks. This is great if you are writing the script from scratch but isn’t great if your are running a stored procedure or your query has a lot of columns.

The answer to this is to use Visual Studio to run your SQL query. In Visual Studio you can write and run queries via Server Explorer and the results produced don’t contain line breaks. I have only just discovered this solution, but so far it has worked and is very easy to do, plus as I do most of my development in Visual Studio anyway it saves me having to open SSMS to test my queries.

Backing up SQL databases to Azure

I recently read a blog post by Pinal Dave about how you can backup straight to Azure Storage. The procedure he described is only available for SQL Server 2014 or later.

I won’t go into detail of this method as Pinal describes it better than I can, but the basic of it requires setting up credentials and then running a backup command that includes the URL of the storage container on Azure.

Unfortunately I am running SQL Server 2005 so this process will not work for me but it did start me thinking of what ways there might be for me to use.

The next thing I tried was Microsoft SQL Server Backup to Microsoft Azure Tool. Unfortunately I did not get this tool to work correctly on my setup. However it sounds like a flexible tool that allows compression and encryption of your backup files. This tool redirects your backup files to your Azure Storage so even if I had got it to work correctly it would not have been an ideal solution as I want local copies of my backup files as well.

After this I started to look at powershell again. Following on from my recent success with powershell I know how to connect to my Azure account so all I needed to script was copying a file from my server to Azure.

Get-ChildItem *.bak -File -Recurse | Set-AzureStorageBlobContent -Container $DestContainer -Force

This command gets all the backup files in a directory (the recurse switch looks in sub directories as well) and then pipes them to the Set-AzureStorageBlobContent command. This command uploads them to the storage container defined in $DestContainer. I have added the Force switch so that it will replace any files on Azure which have the same name.

I have only been using this script for the last few days but so far it has been working well. Now if I completely loose all data from the office I can restore from any other location using the data saved on Azure. A great improvement to my disaster recovery policy.

Things to know before working on your database

I recently saw this blog post by Brent Ozar that I thought I might discuss.

Brent listed 13 questions to ask about a database before you start working with it. I am going to go through these 13 questions and expand on them based on my experiences.

  1. Is this database in production now?
    I think it should go without saying that the first thing you should find out is if your database is in production. If its not in production you can do what you like and no one will notice.
    I know what databases are in production and which aren’t where I work so I can answer this one.
  2. If this goes down, what apps go down with it?
    What apps are running on what databases is a good second question. I have at least one database which has multiple front end apps. At first glance you would think that that these two apps are not connected but they are, I need to be careful with both of these apps to make sure they don’t break each other.
    I know what apps run off what databases.
  3. When those apps go down, is there potential for loss of life or money?
    This is a difficult question so I will split it into two. Loss of life, my databases don’t control life support machines or nuclear weapons so my first instinct would be to say no. However it is not that straightforward, what if your app allowed contractors to know the location of dangers inside a property they were working in. Once your app goes down, they could have an accident due to lack of information.
    Loss of money, this one is more straightforward. Time is money in the business world so any time that your app is down and your employees are not able to work is a loss of money. If your database is linked to an eCommerce site, the loss of money could be extremely high.
    I know what affect downtime will have on my users and business.
  4. How sensitive are these apps to temporary slowdowns?
    Similar to the previous question, a slowdown can be as serious as downtime for some applications.
    Luckily most of my apps are internal only so are not seriously affected by slowness.
  5. When was the last successful backup?uf010206
    I manage the backup schedule for all my databases so I know exactly when each one was last backed up. When ever I do anything to a production database I will run a backup so I can roll back in case of problems. As part of developing changes I run my changes on a backup of the data. I can script all my changes and repeatedly run them against a backup until I am sure no problems will occur.
  6. When was the last successful restore test?
    More important than a backup is testing restoring your databases. If you can’t restore data then your backup is useless. I try to test restoring my backups at least weekly so I know that I can rely on my backups.
  7. Is everyone okay losing data back to the last successful backup?
    If disaster strikes you could loose all data between now and the time of your last backup. But all is not lost transactional backups can be scheduled throughout the day, in my case the most data we could loose is 15 minutes. This could be configured to be more or less frequent depending on your data. But remember the previous question and make sure you test a restore of your transactional backups, if you can’t restore from them you will be forced to restore from the last successful backup.
  8. When was the last successful clean corruption test?
    Corruption can be a killer if it is not found quickly. If you need to restore to the last backup before corruption occurred this could result in a significant amount of data loss. To check for corruption you need to run DBCC regularly.
  9. Do we have a development or staging environment where I can test my changes first?
    If the answer to this is no, then your next job is to setup a development or staging area. Having a development environment makes development a lot easier and I don’t think I could manage to do all the changes I have done recently without one.
  10. Is there any documentation for why the server was configured this way?
    I really wish we had more documentation about configurations as it would make finding out why thing were setup the way they are. So unfortunately the answer to this question is No.
  11. What changes am I not allowed to make?
    Depending on what your app does, where it is hosted, how quick the changes are needed and many other factors will all restrict what changes you can make. Historical decisions on the database can also affect what changes can be made, if the database has been structured in a certain way, it may be very difficult to restructure it in a more efficient way.
  12. Who can test that my changes fixed the problem?
    This is an interesting question, from experience the best people to speak to about problems with the database are the users. If they can show you how to reproduce a problem, you should be able to fix this problem, after that you can probably get them to verify that it has been fixed.
  13. Who can test that the apps still work as designed, and that my changes didn’t have unintended side effects?
    This is an extension of the previous question. The main users of your app should be your first port of call to find out if the app works as expected. However exploring side affects and undesired features is something that I would test as part of the development process. It has taken a while but I have constructed a detailed check list that can be used for testing so I know that most bugs can be found before release.

Runaway SQL Log growth

Today is my day off, but I wake up and have a quick look at nagios to see if there is anything I need to worry about. Yes there is, SQL Server has run out of disk space on its data disk.

I race downstairs and VPN onto the server to find out what has happened. One of my monitoring databases has had runaway log growth and is over 80Gb is size.

BACKUP LOG [DBName] WITH TRUNCATE_ONLY
DBCC SHRINKFILE(‘DBName_Log’)

Free disk space is back to normal, all users will be unaware of the problem and everything is fine again. I create a daily job that runs the above code, that way it should stay a manageable size.

Next I need to find out why it happened and to prevent it happening again in the future (Next time I have a day off I want to lie in!)

I check the SQL logs and notice

BACKUP LOG WITH TRUNCATE_ONLY or WITH NO_LOG is deprecated. The simple recovery model should be used to automatically truncate the transaction log.

Then I remember what I have done to cause this issue. I have a separate disk for my backup files and earlier in the week I noticed this disk was filling up, a large amount of space was taken up by transactional backup files. I thought I don’t need to backup the transactions for this non critical database, I will just do a full backup at the start of everyday.

However what I forgot is that a transactional backup keeps the log file under control, once this backup was stopped the log file grew uncontrollably. The answer, change the database from FULL mode to SIMPLE.image002

This is my understanding of how backups work in FULL mode. A full backup is done at the start of the day which resets the log file, then changes in the database are stored in the log file, this is backed up into a transactional backup and the log file gets reset. If you have regular transactional backups throughout the day the log file doesn’t grow too much, however with no transactional backups your log file contains an entire days worth of changes and so for a monitoring database this could be quite large.

In SIMPLE mode you can’t do transactional backups and the log doesn’t grow uncontrollably. This shouldn’t be used for production databases as if there is a problem you could loose data.

 

I might actually like SQL Server

How did this happen I think I may actually like SQL Server now?download

I remember a few years back when I used to grumble about writing a SQL query that included a JOIN. For some reason back then I could never get my head around JOINs.

Today is a totally different story. I wrote a deployment script that did the following:

1) dropped a database,
2) restored a database from a backup file,
3) Created a database table,
4) copied the contents of a table into this table,
5) disabled and re-enabled some triggers,
6) dropped some constraints and columns from a table
7) and renamed some columns.

It’s not like I woke up this morning with SQL Server knowledge. For many years I have been adding extra features and changing functionality to our line of business database and over time my SQL confidence has grown.

For every single one of the steps above I googled and looked up the SQL syntax (every time I write an insert or update statement I look up the syntax, one day it will stick in my brain!) I think the main reason is once you have used SQL for a while you get to see how it works and can split it down into small steps.

As I blogged the other day I am currently working on improving a bad database and today I wanted to test the deployment process. All my changes are in a SSDT project so I took a backup of my database and tried to publish.

Error! Your changes will result in data loss, no surprises there as I was expecting that error. The main culprit for this was a trigger I wrote but I didn’t find that out until the end of the day.

As I was working on a backup I could do what I like including destructive changes. I tried creating a pre deployment script. I thought if that runs first, I can persuade SSDT that it matches SQL Server and therefore no data is being lost.

This didn’t work, but I had created the start of my script mentioned above. I had steps 3 and 4. Lets try running my script first and then try running the deployment of my SSDT project. To get this to work I am going to have to rebuild my database a few times, I did this manually with SQL Management studio a couple of times until I thought, I could add this to my script – that’s step 1 and 2 done.

Then I got some errors about triggers so I tried disabling them and re-enabling them after it had finished. That’s step 5.

Finally I needed to drop some columns, this resulted in an error about a constraint. A few trial and error run through’s to find out which constraints I needed to drop and then I could drop the columns.

The last step was to rename some columns. I thought after this I can run publish from visual studio. No it still complained about data loss. I dropped another trigger and then I could run everything without any errors.

Wohoo! Aren’t I clever. A simplified version of my code is below

USE master –can’t drop the database if its open!
–Step 1
DROP DATABASE DBName

–Step 2
RESTORE DATABASE DBName FROM DISK=’E:\DBName.bak’
WITH
MOVE ‘DBName_dat’ TO ‘C:\Program Files (x86)\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\DBName.mdf’,
MOVE ‘DBName_log’ TO ‘C:\Program Files (x86)\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\DBName.ldf’,
REPLACE,
STATS=10
USE DBName –swap back to the database you just restored

–Step 3
CREATE TABLE TName2 (
ID INT NOT NULL,
Name NVARCHAR(50) NOT NULL,
Address NVARCHAR(50) NOT NULL,
City NVARCHAR(50) NOT NULL,
Postcode NVARCHAR(50) NOT NULL)

–Step 5
DISABLE TRIGGER TR_Trigger ON TName1
GO

–Step 4
INSERT INTO TName2 (Id,Name)
SELECT Id, Name FROM TName1
GO

–Step 6
ALTER TABLE TName1
DROP CONSTRAINT [DF_Id], [DF_Name]
GO
ALTER TABLE TName1
DROP COLUMN Id, Name
GO

–Step 7
sp_RENAME ‘TName1.City’, ‘Area’ , ‘COLUMN’
GO

–drop that last trigger
DROP TRIGGER TR_Trigger2
GO

–Step 5
ENABLE TRIGGER TR_Trigger ON TName1