Git in the enterprise

Wed, Jun 20, 2012

Is it ready yet?

Over the course of the last seven years, I’ve lived and breathed software development in everything from a small web shop right up to large scale international enterprise organisations.
Unpinning that has been version control in all of its various forms. Those of you who have been in software development in a similar time period will remember tools like CVS – something that was found to be a confusing and somewhat challenging method to manage revisions.
Later comers are undoubtedly more familiar with modern version control systems, such as SVN – billed as CVS done right. For those that made the transition – and adopted user friendly tools such as TortoiseSVN or various IDE plugins – the feeling of relief that SVN brought about was palatable.
Namely – everything had a single revision. Branching or tagging? It’s trivial. Dealing with conflicts? TortoiseMerge and others of it’s ilk made it a snap. Want to migrate and existing codebase? There’s an app for that! Pre-defining a simple set of trunk, branches and tags was a subtle but important improvement as well – it meant that 75%+ of the time you could hop into someone’s SVN server and know exactly where things were.
Very quickly, an ecosystem of tools grew up around SVN. Most importantly, freed from the previous paradigms, these tools were network oriented and re-used building blocks of the web. Removing this complexity from the applications and aligning to a common architecture allowed harder problems to be solved – and the results spoke for themselves.
Trac (with SVN integration) was to Bugzilla as Rifles were to Swords – a massive leap forward in a completely new direction.
Open source projects saw the opportunity and migrated quickly, and enterprise followed shortly there after.
Don’t just take my word for it – shown below are statistics produced by CollabNet regarding SVN adoption/download rates.
2009

A 2009 Forrestor Research study showed the results of this rapid growth against other version control systems:

Clearly, SVN had established itself as a mature and capable solution by 2009, and has remained so to this day.
Or has it?
The rise of continuous integration, increasing distribution of development teams and ever expanding and aging code bases introduced new pressures.
Anyone who ever tried to set up an SVN pre/post commit hook will probably agree that it’s hard – you need admin access to the server, and to muddle through areas of the SVN handbook that as a developer, you just haven’t had to work with. Further, try convincing a sysadmin focused on stability that you want to add post commit hooks – likely managed by the version control system itself – onto the server they are responsible for, that execute at random and aren’t controlled by the sysadmins, but developers.
Given the common separation of concerns between infrastructure and development teams, pushing the limits of server side SVN capabilities quickly becomes an unfeasible task in an enterprise environment, requiring higher levels of cooperation and communication between the groups.
To look at the expansion of codebases, it very quickly starts to take a long time to check out the contents of a working copy as the number of revisions increases. Admittedly, Subversion 1.7 addresses this with noticeably enhanced performance; but many enterprises are reluctant to forge ahead and adopt this – still stuck in Subversion 1.4 – 1.6.
Committing to a single server, which is being hammered by continuous integration, and with multiple other users quickly shows that performance can suffer, and suffer dramatically. SVN’s ‘hot copy’ backups are aptly named for the effect on your infrastructure – it’s load intensive with large repositories to say the least.
These are tolerable pains for developers, but in a similar fashion to CVS’ tolerable pains – we can put up with them, but should we?
Open source developers are saying no in a resounding fashion.
How resounding is resounding? If you thought a growth rate of 250% in 2 years for SVN was an impressive leap, Github reports “1.5 million people working on over 2.5 million repositories” per day (May 2012) – these are open source projects alone. It was only in 2009 that Github reported 100,000 users.
Major projects – Rails, PHP, even Linux itself – have adopted git as the best fit for their large codebases and geographically dispersed developer communities.
Why are they doing this? What is Git offering over SVN, and how can that work for the enterprise?
Git is fast. Git takes compression of content seriously, and squeezes as much as possible into every pull/push (think svn update).
Further than that, git works around the idea of a local repository. This translates very effectively into improved productivity for developers. Local commits begin to be used like the save button – it’s so fast, there’s no incentive not to do it.
If you have developers sitting on the bus to work with no internet, they can still hack on code – and contribute back suites of changes on arrival. When changes are ready, they are pushed (more like a traditional SVN commit) – this means that your CI solution can test an entire feature of work, rather than incremental changes.
Branching code is incredibly fast and simple – swapping from one set of code to another is milliseconds, not the minutes of an SVN switch + remote repository. It becomes extremely fast to apply 1 line bug fixes, or work on an isolated feature branch – swapping from one to the other is about as complex/time intensive as utilising alt tab to swap windows.
Git is distributed by nature. Git can work via HTTP or the git protocol, but more importantly there is no central repository. This is of massive importance when looking at modern, responsive workforces based on and off shore.
As a practical example of that, imagine: On shore team works on features and functionality through the day Off shore team works on maintenance, QA and bug fixing through the night When the on shore team wants to integrate the bug fixes, it only takes one command to “pull” or “merge”.
This horizontal scaling of change is non trivial to execute with other version control systems. Imagine trying to coordinate between two teams authoring changes to trunk vs a CI solution, or trying to integrate a ‘feature branch’ via SVN repeatedly, and cleanly.
In SVN it can be made workable with two teams, but is it workable with 3? 5? 10?
The best previous answers to this have been the defect report and unified diff / patch – it works, but introduces code debt almost immediately. A red tape review process exists, and it takes multiple steps to download, apply, test, accept and close off a code change.
Git makes this trivial with the notion of a pull – and github takes this further with pull request. In the latter case, inline code review and applying of a change is a one click operation. All of the friction from managing code debt is eliminated.
The final advantage of git’s distributed nature is the reduced complexity with backups. Instead of exporting your SVN repository in a time intensive, performance harming periodic exercise; you can simply treat every clone of a repository as a backup. A more formal approach simply requires the same periodic task setup, but the execution of it is git clone.

What are the drawbacks? As ever, nothing is perfect.

Git clients are springing up left and right, but it is not trivial to throw an SVN trained developer into a git environment and walk away.
Merging with git can be alarming – many of the clients do not have equivalent niceties that developers may be used to with tools like TortoiseMerge. Git often works best as a console application – a windows based developer will not immediately feel comfortable unless an appropriate IDE plugin can be found.
The biggest drawback is arguably Github itself. Github is a wonderful tool that adds the user friendliness and functionality that many git users need to function. The danger with it (it’s not open source, there’s only one of it, that an outage, or security breach could represent a massive loss to those same 1.5 million+ users) are part of the inherent dangers of a monoculture.
How can git move away from this emergent single point of failure in a manner suitable for enterprise?
There are solutions – for example, Gitorious, or the excellent GitStack for windows oriented enterprises, or even Github’s own enterprise offerings “in a box”. It should be noted that enterprises may find themselves writing extensions to current open source offerings to fulfill functionality gaps if that is the path selected.
Simply put the strongest advice that can be given to an enterprise when assessing git is: Do I have a distributed development team problem? Am I willing to embrace and extend offerings such as GitStack or Gitorious to fill potential capability gaps within my organisation? Are there communication problems between my developers and infrastructure that impose the same advanced SVN capability tax?
While each organisation will have it’s own answers to each of these, git as a distributed source management solution is an attractive option for many of these issues.
With the rapidly increasing adoption and tools available to the git eco-system, enterprises should be encouraged to strongly consider git for their needs – or risk being left behind by a rapidly growing movement in software development.

About the Author

Daniel O’Connor is an Architect, Software Developer and enthusiastic Open Source contributor.
His experience stems from over 7 years of web oriented and enterprise development, working collaboratively with teams located in Australia, New Zealand, Manila, the US & Europe for financial, mortgage and real estate industry clients, as well as deep involvement with a number of Open Source projects.