Baselining the builds

There has been quite a bit of progress made in our department at work with regards to unit testing and continuous integration recently, so I thought I would write down, “where we were”, “where we are now”, and “where do we go from here”.
Where we were
About two years ago our department started to move towards unit testing and continuous integration, but with 10 years worth of code the move was always going to be hampered with the existing legacy code base. 
Our setup was a build box for the Java applications which was using CruiseControl and a build box for the PHP applications using the phpUnderControl version. This lead to no overall look at the codebase, and no common statistics etc. Since then we have now moved the Java applications to the Hudson environment, with an open question of “can we move the PHP builds to this tool?” The main reasons behind the need to move were: common tool set and common reporting. The diagram below showed the visibility with our initial setup.
The downsides to this process were:
  • Lack of overall reporting
  • Information hard to push upwards to management. They don’t want to look at a webpage that has red/green dots on it. They wanted more in depth information that means something to the business such as code coverage or lack of coverage etc
  • Developers who were less than receptive to unit testing constantly asked what the build box url was. Acted as a barrier. You would be amazed at how many times you can be asked what the url is to a build box that should be checked constantly all day..
The code base was not the only issue we would encounter along the way, and to be fair, looking back now, it was actually the easiest issue to overcome. Obstacles we had to overcome include:
  • Developer buy in
  • Resource allocation
  • Training
Obviously the list above is not exhaustive, but I would say that they were certainly the main factors that impacted the growth of unit testing within the department.
Over the last two years, we have managed to isolate about 100 ‘builds’, all of which were deployed to the PHP build box. The sad truth about some of these builds was that the tests were of very poor quality, some of which weren’t even tests, they just executed the code. If a build was broken, very few people would take ownership in order to make the build pass again. It would continually fall to a select few developers within the department. As you can imagine, it is hard to keep the morale high when you see other developers breaking builds, and it is left to you to make the build pass again.
So, what do you do when you have about a 60-40% split of passing to failing builds? Using the various dashboard and radiator views provided by CruiseControl and phpUnderControl still did not easily provide instant information as to what build may have failed overnight etc. I personally use to check git commits to the trunk and then look at the commit diff to see if tests had been written. If they had, then I would check the build, if they hadn’t I would try and understand the reasoning for missing the tests off. If the build was broken or there were missing tests I use to post in our internal forums as to what happened.
Where we are now

The setup within the department has vastly changed since we first started (See diagram below). We now have:
  • Hudson running the Java builds
  • phpUnderControl running the PHP builds
  • Sonar running to report information on all builds (Java and PHP)
  • 4 TV’s in the office rotating the information from Hudson, phpUnderControl, Sonar and some in house scripts to create radiator views. This is all running using Google Chrome in Kiosk mode and a plugin called Rotator.


The benefits of the setup above are clear:
  • High visibility to all parties of the software development life cycle
  • Bespoke continuous integration tools being used by different knowledge bases (Java and PHP developers can use the tools they are use to)
  • Knowledge sharing of the reporting tool
  • Overall reporting tool, which disseminates the information via the TV’s to anyone who needs it.
  • Information is clear and precise. The views available are detailed enough for developers to understand what is happening and high level enough for managers to catch a glimpse of the information they needed. 
With the use of Sonar, it has meant that we don’t really need to answer the question of should the PHP builds be using Hudson etc. It doesn’t really matter which continuous integration tool we use as long as all the information is pumped into Sonar to do all the reporting magic. And my feelings towards Sonar right now are of a very high regard. The amount of information it provides is fantastic.

So within the last two weeks, we have had 2 test fests in order to base line the builds – make them all at least build. I have personally been pushing this for quite some time now, but not everyone has seen this as a benefit. I was of the opinion that all the builds needed to have passing tests, and therefore ‘green’ so you could at least see progress being made, and also to hold people to account if the build was broken. I held the belief that deleting failing tests, in the first instance as a one off, was the way to go. There is no benefit of a failing build staying failed. What happens if some other valuable tests were broken in the background. There was no visibility to this. Other people in the department complained that this was the easy way out and that we should fix the code and tests to make the builds work. Now this is an interesting point. I can understand their thoughts, but when you have been one of the few working in your own time to achieve such a thing to then only see the build break again, you tend to think – “Well get on and do it then”. It’s a difficult conversation to have with management to explain that we, as developers, have made a mess of our own code, and therefore want time out of the resource pool in order to correct that. Now, depending on the changes being made, we may also need other involvement from teams such as QA, again impacting the resource pool.
By removing failing tests, or fixing them where possible without needing to change production code is the best of both worlds. Especially if we have time in the future to address the quality of the tests, then removing poor tests now is just a means to an end
Where do we go from here
I’m personally hoping for regular test fests in the department, which can then focus on unit test quality and overall coverage for the build, rather than fixing broken builds. Fixing broken builds should become a thing of the past, as all developers will have bought into this way of working, and therefore we would never be in a position to break a live build.
I would also like to see the builds be allocated to certain owners within the department, so they keep a track of statistics such as code coverage. By doing this you are also keep an oversight of the quality of the production code. You don’t find out in 6 months that the code you worked hard to have a high coverage of tests has since been left behind and now only has 20% code coverage!
I think the future should be like:
  • Better quality of unit tests
  • Better understanding of what makes a good/valid unit test
  • Better understanding of why we are unit testing and the principles thereof
  • Increasing code coverage and hopefully by the same token code quality
  • Smaller QA testing cycles with fewer bugs
  • Documented behaviour via the unit tests
  • A uniform reporting tool for all builds, be it Java, PHP or x
  • Input into open source software which we use daily so we can make a difference in the community
Here’s to the next two years.