Weakest Database Design

I have been recently working on a database that hasn’t been designed but has been hacked together by lots of people over the years.

It started life as an Microsoft Access database created by my director to keep track of all the projects that our company does. It was not much more than one table listing everything. At some point the data and frontend was split with the data being stored in SQL Server and the front end being an Microsoft Access ADP.

Over the years more and more columns have been added to this table that it long ago became pretty much unmanageable. A few normalized tables exist but for the most part everything is stored in this one table which makes querying it very difficult. At the last count there were 81 columns in this one database table.

Attempts to improve this database have been tried before but due to the fact that the whole company basically runs off this database and no one person knows everything that the database needs to do causes any changes to be very difficult. I suspect it may be beyond my abilities but I am going to give it a go.

One advantage I have is version control. All the tables and the structure is in version control, so when I break things I can roll them back. Any changes to the front end are also in source control, these are harder to roll back in case of problems but not impossible.

I have met with department heads to try and establish which of the 81 columns are being used and which could be dropped. Database design shouldn’t be a democracy, your databases should be designed in a way to store your data most efficiently. In this case it felt a bit like the database was playing “the weakest link” with everyone voting off unpopular columns. “You are the weakest database column. Goodbye!” After much discussion I have over 30 columns that can be dropped and 10 or so that require further investigation.

Using the Visual Studio project of the database, I removed the columns. This generated thousands of errors as most columns are included in loads of views and stored procedures. All these references need removing, so not a small job. This is made worse by the fact that the views and stored procedures are in a mess, and it is very hard to read the T-SQL code and see what is doing what. Part of my next job is going to be tidying up all these queries so I can see what’s what.

The advantage of using the Visual Studio project to do these changes is that I can generate a SQL script of the changes and I can rerun a few times and test restoring different columns as it is unlikely to be a smooth deployment.

To boldly go where no SQL has gone before Part 2

Let’s continue looking at a database schema for storing details of every Star Trek Episode. If you are new to databases, a schema is just the design of the structure of the database.

We have three tables, Episode, EpisodeWriter and Writer. See my last post for more details of these. It has been suggested that a slight change to this structure would enable storage of more of the creative staff.

Lets rename Writer and call it Credit, and rename EpisodeWriter and call it EpisodeCredit. Now any creative staff member involved in an episode can be stored in the Credit table. Lets alter EpisodeCredit and add an extra column called CreditType. CreditType is just a text field that stores the role that creative person had on that episode, it can be anything from Director, Writer, Actor, Science Consultant etc.

In case anyone wants to recreate the databases I have described here, I have saved the SQL on a separate page which can be found here.

We now have the ability to store information relating to the episode in the episode table and any creative people in the credit table. What else can we add to the database? How about a table that can be used to record when an episode was last watched. I am probably weird but sometimes I want to watch a Star Trek episode that I like but I haven’t watched in ages.

The last watched table is really simple and just have a datetime field and episodeId. This can be further expanded to have a userId field if you wanted to keep track of what episodes your friends had been watching.

Another idea could be to tag episodes with certain themes or topics like Klingon episodes or meaning of life episodes or Kirk talks a computer to death episodes. Again this is fairly simple table containing TagName and EpisodeId.

 

To boldly go where no SQL has gone before

My last post proved quite popular so I am wondering if I can combine a post about IT and Star Trek.

Years ago I used to have lists of Star Trek episodes, which included such information like original air date, production number, episode title and brief description.

One thing that was hard to keep track of was how many episodes were written by a specific person. This is because episodes are written by multiple people. A column called writer would then need to contain multiple people, another option would be to have columns called writer1, writer2 etc. This wouldn’t help either as you wouldn’t know which column a specific writer had been saved in.

manytomanyThe relationship between writer and episode is known as a many to many relationship. An episode can have many writers and a writer can have written many episodes. To achieve this structure in a SQL database you will need three database tables as it is not possible to create a many to many join between two tables. The first table will contain all the episodes, the second table will contain all the writers, the third table known as a junction table, will contain the relationship between the two.

Let’s do an example so we can see how this would work. Gene Roddenberry creator of Star Trek wrote the pilot episode ‘The Cage’. So Gene would be added to the writers table with an id of 1 and The Cage would be added to the episode table with an id of 1. In the junction table, it has two columns episode and writer, so we would enter 1 and 1 into these columns.

Select * from Episode e
Join EpisodeWriter ew on e.id = ew.EpisodeId
Join Writer w on w.id = ew.WriterId

But if Gene Coon and Gene Roddenberry had writing credits on The Cage we would need to add Gene Coon to the writing table and add an extra entry to the episodewriter table.

I am going to do more posts based on this as I expand the database structure to include other information, I may go on to create stored procedures for bringing back certain information or I may use this as an example to talk about coding a user interface.