Github Vs Bitbucket Vs Visual Studio Team Services

githubAs a developer using source control and git is bread and butter of what we do. Github is probably the most popular and widely known hosting service for source control but I have also used Bitbucket and Visual Studio Team Services. Lets have a look at each one and what they offer. Note while I have included prices I have only tried out the free versions. Continue reading

What should be in Source Control?

I am currently working on source code that is over 5Gb in size. This is mostly due to a poorly thought out folder structure, there are code files, images and Excel files all jumbled together. I think a clear distinction should be made between source code and data.

Source Code

I will define source code as anything that is written in order to compile and run the project. If it is a webpage it will be all the HTML, CSS and Javascript or any file used to produce these. I would also include any configuration files and files used to build/deploy the website or project. Anything that is compiled from your source files can safely be ignored.


I would define data as anything that is added to the project during its life. So if you have an upload option, anything that is uploaded I would describe as data. The site should still function without (or very little) data.


Images can fit into both groups. Any icons or images attached to the functionality of the project I would class as source code. However anything that is uploaded should be classed as Data.


The database should also be classed as both. The data, anything that is inside a database table should normally be classed as data. Stored Procedures, Functions and Views are all Source Code and would benefit from version control.

Source Control != Backup

Source control is not an excuse not to backup things. Don’t just commit files to source control so you know you can restore them if you need to. Files in general in source control are there so you can see how they changed over time as the code base changed. Files in you backup are a snapshot of what the application was at a point in time and will include ALL the data.

One last point before I end. If you are hosting on a Cloud Computing platform like Azure it gives you an easy way to distinguish between Data and Code.

Anything in your

Web App = Code
Blob Storage = Data
SQL = Data/Code

Each project is unique and there will always be exceptions to these suggestions but I think this is a good goal to have. What do you think?

Automatic Git Tagging

One of the features of git is the ability to tag a point in my change history with a tag. For a while now I have been manually tagging my code whenever I do a release, so I can easily work out what has changed by doing a diff between two tags.

Now that I am automating my release process with TeamCity I am thinking about how to manage my tags better.

TeamCity has a setting called VCS Labeling which comes in very handy.

VCS Labeling

Configuring it is fairly simple as it only has three settings.

VCS root to label: This is obviously the url to your git repository
Labeling pattern: This is the text of the label to be added.
Label successful builds only: Do you really want to add a tag if the build failed?

A tag needs to have a unique name, so adding a tag just called deployed won’t work. When I used to add tags manually I used the naming convention of deployedyyyymmdd.  While this naming convention is possible with TeamCity I use something a bit more complex to provide more information about what has been deployed.

TeamCity provides lots of parameters that can be used in your build steps and also in the Labeling pattern box. I started off using as my tag which just marks git with the TeamCity build number.

When I run a TeamCity deployment I don’t always use the same configuration options, I deploy locally, to a test server or to production and sometimes I just deploy the frontend or the backend. How cool would it be to include this information in the tag text?

Well the next step was to change my Labeling pattern to, this adds the backend database config settings and the path the frontend was deployed to. Now when looking at git you can see commits marked with multiple tags, one for each deployment that succeeded and the tag will indicate the settings used during that deployment.

Now I will never forget to add the tag after a release as the adding of a tag is part of the deployment process, if the deployment fails the tag won’t be added. I can test my deployment to test and git will show if this has been successful, and when I deploy live this will also show up.

How do you use tags? Do you mark successful builds with a tag? Why not let me know or leave a comment below.

My git repository is too large!

Today I did a clone of one of my git repositories and it took ages to download. Looking into what got downloaded it was easy to see why. The .git folder was over 500Mb in size.

I know how this has happened. This repository was created in 2013 and has been used by me as a dumping ground for lots of things related to the code which never should have been committed.

Since 2013 I have learnt a lot more about coding and git so the current version of the files in git isn’t too bad. But git keeps the history of changes for every file so bad practices like this are kept.

What can I do about this? Well what does google suggest? I found this blog post.

It suggests ways of listing all the large files that are stored in git and a way to remove them. As I am the only person that regularly commits to this repository I see no problem with giving it a go.

I will summarise the steps here.

git clone url

Now you need all the remote branches so there is a bash script to run

for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
git branch –track ${branch##*/} $branch

Another bash script then lists the top 10 large files

#set -x
# Shows you the largest objects in your repo’s pack file.
# Written for osx.
# @see
# @author Antony Stubbs
# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`
echo “All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.”
for y in $objects do
# extract the size in bytes
size=$((`echo $y | cut -f 5 -d ‘ ‘`/1024))
# extract the compressed size in bytes
compressedSize=$((`echo $y | cut -f 6 -d ‘ ‘`/1024))
# extract the SHA
sha=`echo $y | cut -f 1 -d ‘ ‘`
# find the objects location in the repository tree
other=`git rev-list –all –objects | grep $sha`
#lineBreak=`echo -e “\n”`
echo -e $output

After running the script you will see details of your largest files. I had *.msi and *.exe files in the mix. To remove them run the following, where filename is the path to the file that needs removing.

git filter-branch –tag-name-filter cat –index-filter ‘git rm -r –cached –ignore-unmatch filename’ –prune-empty -f — –all

To reclaim the disk space run the following commands

rm -rf .git/refs/original/
git reflog expire –expire=now –all
git gc –prune=now
git gc –aggressive –prune=now

Now push your changes back to the remote server.

git push origin –force –all
git push origin –force –tags

Now if you do a clone it will be much smaller than before and you can get back to coding much quicker, without having to wait.

Common git commands

I use git as my source control system. Here are some of the most common git commands in no particular order.

Check Out a git Repository git clone /path/to/repository
Add files to commit git add <filename>
Commit files to git git commit -m “Commit message”
Push changes to server git push origin master
Show status git status
Create new branch git checkout -b <branchname>
Switch to a branch git checkout <branchname>
Get remote changes git pull
Merge a different branch into the current one git merge <branchname>
View merge conflicts git diff
Temporarily stash uncommitted changes git stash
Undo a commit git reset –hard <commit>
Show details about a commit git show <commit>
Show version history of current branch git log
Gets all remote branches git fetch origin

Most of the time I use the git extensions tools to do my git work, but there are times when only the command line will do.

Writing better Git commit messages

git_commitI always use source control for my coding changes, however some of my commit messages leave something to be desired.

I always try to write a commit message but I often think that the change themselves should be enough to indicate what I did. I also don’t need to include who made the change or the time and date and that gets included automatically.

Here are some tips I have found that may help me in the future.

1) Ask yourself why you are making this change. The Who, When and What are already being covered so it is only the why that needs including in the commit message.

2) If your commit breaks something or causes side affects or a new dependency this should be included in the commit message. If your commit breaks functionality, consider if you really need to commit it yet, maybe only commit once fixed?

3) If your commit includes a long list of changes consider if the commit needs splitting into several commits. It is easy to only commit one or two files write a specific commit message and then commit the rest of the changes separately.

4) Consider including a subject for larger commits. The following comes from the git manual.

Though not required, it’s a good idea to begin the commit message with a single short (less than 50 character) line summarizing the change, followed by a blank line and then a more thorough description. The text up to the first blank line in a commit message is treated as the commit title, and that title is used throughout Git.

5) If you use a subject follow the following conventions:

  1. Limit of 50 characters
  2. Start with a capital letter
  3. Do not end with a full stop
  4. Use the imperative mood i.e. write as if issuing a command