Github Vs Bitbucket Vs Visual Studio Team Services

githubAs a developer using source control and git is bread and butter of what we do. Github is probably the most popular and widely known hosting service for source control but I have also used Bitbucket and Visual Studio Team Services. Lets have a look at each one and what they offer. Note while I have included prices I have only tried out the free versions. Continue reading

What should be in Source Control?

I am currently working on source code that is over 5Gb in size. This is mostly due to a poorly thought out folder structure, there are code files, images and Excel files all jumbled together. I think a clear distinction should be made between source code and data.

Source Code

I will define source code as anything that is written in order to compile and run the project. If it is a webpage it will be all the HTML, CSS and Javascript or any file used to produce these. I would also include any configuration files and files used to build/deploy the website or project. Anything that is compiled from your source files can safely be ignored.


I would define data as anything that is added to the project during its life. So if you have an upload option, anything that is uploaded I would describe as data. The site should still function without (or very little) data.


Images can fit into both groups. Any icons or images attached to the functionality of the project I would class as source code. However anything that is uploaded should be classed as Data.


The database should also be classed as both. The data, anything that is inside a database table should normally be classed as data. Stored Procedures, Functions and Views are all Source Code and would benefit from version control.

Source Control != Backup

Source control is not an excuse not to backup things. Don’t just commit files to source control so you know you can restore them if you need to. Files in general in source control are there so you can see how they changed over time as the code base changed. Files in you backup are a snapshot of what the application was at a point in time and will include ALL the data.

One last point before I end. If you are hosting on a Cloud Computing platform like Azure it gives you an easy way to distinguish between Data and Code.

Anything in your

Web App = Code
Blob Storage = Data
SQL = Data/Code

Each project is unique and there will always be exceptions to these suggestions but I think this is a good goal to have. What do you think?

Automatic Git Tagging

One of the features of git is the ability to tag a point in my change history with a tag. For a while now I have been manually tagging my code whenever I do a release, so I can easily work out what has changed by doing a diff between two tags.

Now that I am automating my release process with TeamCity I am thinking about how to manage my tags better.

TeamCity has a setting called VCS Labeling which comes in very handy.

VCS Labeling

Configuring it is fairly simple as it only has three settings.

VCS root to label: This is obviously the url to your git repository
Labeling pattern: This is the text of the label to be added.
Label successful builds only: Do you really want to add a tag if the build failed?

A tag needs to have a unique name, so adding a tag just called deployed won’t work. When I used to add tags manually I used the naming convention of deployedyyyymmdd.  While this naming convention is possible with TeamCity I use something a bit more complex to provide more information about what has been deployed.

TeamCity provides lots of parameters that can be used in your build steps and also in the Labeling pattern box. I started off using as my tag which just marks git with the TeamCity build number.

When I run a TeamCity deployment I don’t always use the same configuration options, I deploy locally, to a test server or to production and sometimes I just deploy the frontend or the backend. How cool would it be to include this information in the tag text?

Well the next step was to change my Labeling pattern to, this adds the backend database config settings and the path the frontend was deployed to. Now when looking at git you can see commits marked with multiple tags, one for each deployment that succeeded and the tag will indicate the settings used during that deployment.

Now I will never forget to add the tag after a release as the adding of a tag is part of the deployment process, if the deployment fails the tag won’t be added. I can test my deployment to test and git will show if this has been successful, and when I deploy live this will also show up.

How do you use tags? Do you mark successful builds with a tag? Why not let me know or leave a comment below.

Revisiting Team City

TeamCity-Logo-570x570Last year I blogged about Team City, well I have been looking at it again recently. In that time they have even changed their logo!

Lets start with thinking about what I want my Continuous Integration server to do.

  1. Check out my code from source control (usually master but all feature branches would be even better)
  2. Configure specific setting for build
  3. Build my code
  4. Build my databases
  5. Run any unit tests
  6. (Optional) Run deployment to Azure Test/Live site

There are probably other things I want to achieve but I will start with these six.

  1. Checking out code from source control is something Team City does out of the box, so I can safely say I have done this now. It even monitors a branch for changes and initiates a new check out.
  2. Team City allows you to create specific build steps so in theory you can have multiple builds for every variation of settings that you want for your code. I have not tried this yet apart from building with the default config, but I don’t expect it will be too difficult.
  3. I have managed to get my code to build with Team City, it took a bit of tweaking the different build steps but wasn’t too difficult. Team City has a visual studio build agent which takes you solution file and does what it needs to. The one problem I have found with this step is that I get errors with my tests if I select a Debug config instead of Release.
  4. Databases are always the problem part of the deployment. So far I have manually deployed my databases but I intend on revisit this step. A stackoverflow post suggests that I can run SQL code via Team City in the following way by creating a command line executable:

    Command executable: c:\Program Files\Microsoft SQL Server\100\Tools\Binn\sqlcmd.exe
    Command parameters: -S [ServerName] -i [PathToSQLScript]

    I have yet to try this but I am hopefully that it will just work. Dropping a database and restoring a back and then running different SQL scripts is all possible from TSQL, so I should be OK. Watch this space for more details.

  5. Running the unit tests got me stuck for a while. I tried setting it up using VSTest or MSTest neither worked mainly because a config file wasn’t being copied with the test binaries. When I tried using NUnit it just worked. The tests that failed gave me a few config settings to change.
  6. I have powershell scripts that deploy to Azure websites, I think that these could form the basis of a deployment to Azure. Again the difficult step here may end up being deploying all the different databases to Azure. This is also the riskiest step as I need to connect to live servers which is why I will leave this to last, at the very least I could generate scripts that do a full deployment.

That’s it for now. Once I have this all working I will revisit again with details of the database steps as I am expecting a few challenges to overcome. What have you used a CI Server for? Are there other things I want to achieve from a project like this? Why not contact me or leave a comment below

My git repository is too large!

Today I did a clone of one of my git repositories and it took ages to download. Looking into what got downloaded it was easy to see why. The .git folder was over 500Mb in size.

I know how this has happened. This repository was created in 2013 and has been used by me as a dumping ground for lots of things related to the code which never should have been committed.

Since 2013 I have learnt a lot more about coding and git so the current version of the files in git isn’t too bad. But git keeps the history of changes for every file so bad practices like this are kept.

What can I do about this? Well what does google suggest? I found this blog post.

It suggests ways of listing all the large files that are stored in git and a way to remove them. As I am the only person that regularly commits to this repository I see no problem with giving it a go.

I will summarise the steps here.

git clone url

Now you need all the remote branches so there is a bash script to run

for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
git branch –track ${branch##*/} $branch

Another bash script then lists the top 10 large files

#set -x
# Shows you the largest objects in your repo’s pack file.
# Written for osx.
# @see
# @author Antony Stubbs
# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`
echo “All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.”
for y in $objects do
# extract the size in bytes
size=$((`echo $y | cut -f 5 -d ‘ ‘`/1024))
# extract the compressed size in bytes
compressedSize=$((`echo $y | cut -f 6 -d ‘ ‘`/1024))
# extract the SHA
sha=`echo $y | cut -f 1 -d ‘ ‘`
# find the objects location in the repository tree
other=`git rev-list –all –objects | grep $sha`
#lineBreak=`echo -e “\n”`
echo -e $output

After running the script you will see details of your largest files. I had *.msi and *.exe files in the mix. To remove them run the following, where filename is the path to the file that needs removing.

git filter-branch –tag-name-filter cat –index-filter ‘git rm -r –cached –ignore-unmatch filename’ –prune-empty -f — –all

To reclaim the disk space run the following commands

rm -rf .git/refs/original/
git reflog expire –expire=now –all
git gc –prune=now
git gc –aggressive –prune=now

Now push your changes back to the remote server.

git push origin –force –all
git push origin –force –tags

Now if you do a clone it will be much smaller than before and you can get back to coding much quicker, without having to wait.

Common git commands

I use git as my source control system. Here are some of the most common git commands in no particular order.

Check Out a git Repository git clone /path/to/repository
Add files to commit git add <filename>
Commit files to git git commit -m “Commit message”
Push changes to server git push origin master
Show status git status
Create new branch git checkout -b <branchname>
Switch to a branch git checkout <branchname>
Get remote changes git pull
Merge a different branch into the current one git merge <branchname>
View merge conflicts git diff
Temporarily stash uncommitted changes git stash
Undo a commit git reset –hard <commit>
Show details about a commit git show <commit>
Show version history of current branch git log
Gets all remote branches git fetch origin

Most of the time I use the git extensions tools to do my git work, but there are times when only the command line will do.