Tuesday, July 01, 2008

Legacy Projects: Get Statistics from your Build Server

As I mentioned in my post, Working with Legacy .NET Projects, my latest project is a legacy application with no tests. We're migrating from .NET 1.1 to .NET 2.0, and this is the first entry in the series of dealing with legacy projects. Click here to see the starting point.

On the majority of legacy projects that I've worked on, there is often a common thread within the development team that believes the entire code base is outdated, filled with bugs and should be thrown away and rewritten from scratch. Such a proposal is a very tough sell for management, who will no doubt see zero value in spending a staggering amount only to receive exactly what they currently have, plus a handful of fresh bugs. Rewrites might make sense when accompanied with new features or platform shifts, but in large they are a very long and costly endeavour. Refactoring the code using small steps in order to get out of Design Debt is a much more suitable approach, but cannot be done without a plan that management can get behind. Typically, management will support projects that can quantify results, such as improving server performance or page load times. However, in the context of a sprawling application without separation of concerns, estimating effort for these types of projects can be extremely difficult, and further compounded when there is no automated testing in place. It's a difficult stalemate between simple requirements and a full rewrite.

Assuming that your legacy project at least has source control, the next logical step to improve your landscape is to introduce a continous integration server or build server. And as there are countless other posts out there describing how to setup a continuous integration server, I'm not going to repeat those good fellows.

While the benefits of a build server are immediately visible for developers, who are all too familiar with dumb-dumb errors like compilation issues due to missing files in source control, the build server can also be an important reporting tool that can be used to sell management on the state of the application. As a technology consultant who has played the part between the development team and management, I think it's fair to say that most management teams would love to claim that they understand what their development teams do, but they'd rather be spared the finer details. So if you could provide management a summary of all your application's problems graphed against a timeline, you'd be able to demonstrate the effectiveness of their investment over time. That's a pretty easy sell.

The great news is, very little is required on your part to produce the graphs: CruiseControl 1.3 has a built in Statistics Feature that uses XPath statements to extract values from your build log. Statistics are written to an xml file and csv file for easy exporting, and third party graphing tools can be plugged into the CruiseControl dashboard to produce slick looking graphs. The challenge lies in mapping the key pain points in your application to a set of quantifiable metrics and then establishing a plan that will help you improve those metrics.

Here's a common set of pain points and metrics that I want to improve/measure for my legacy project:

Pain Metrics Toolset
Tight Coupling (Poor Testability) Code Coverage, Number of Tests NCover, NUnit
Complexity / Duplication (Code Size) Cyclomatic complexity, number of lines of code, classes and members NCover, NDepend, SourceMonitor or VIL
Standards Compliance FxCop warnings and violations, compilation warnings FxCop, MSBuild

Ideally, before I start any refactoring or code clean-up, I want my reports to reflect the current state of the application (flawed, tightly coupled and un-testable). To do this, I need to start capturing this data as soon as possible by adding the appropriate tools to my build script. While it's possible to add new metrics to your build configuration at any time, there is no way to go back and generate log data for previous builds. (You could manually check out previous builds and run the tools directly, but would take an insane amount of time.) The CruiseControl.NET extension CCStatistics also has a tool that can reprocess your log files, which is handy if you add new metrics for data sources that have already been added to your build output.

Since adding all these tools into your build script requires some tinkering, i'll be gradually adding these tools into my build script. To minimize changes to my cruise control configuration, I can use a wildcard filter to match all files that follow a set naming convention. I'm using a "*-Results.xml" naming convention.

<-- from ccnet.config -->

Configuring the Statistics Publisher is really quite easy, and the great news is that the default configuration captures most of the metrics above. The out of box configuration captures the following:

  • CCNET: Build Label
  • CCNET: Error Type
  • CCNET: Error Message
  • CCNET: Build Status
  • CCNET: Build Start Time
  • CCNET: Build Duration
  • CCNET: Project Name
  • NUNIT: Test Count
  • NUNIT: Test Failures
  • NUNIT: Tests Ignored
  • FXCOP: FxCop Warnings
  • FXCOP: FxCop Errors

Here's a snippet from my ccnet.config file that shows NCover lines of code, files, classes and members. Note that I'm also using Grant Drake's NCoverExplorer extras to generate an xml summary instead of the full coverage xml output for performance reasons.


<firstMatch name='NCLOC' xpath='//coverageReport/project/@nonCommentLines' include='true' />
<firstMatch name='files' xpath='//coverageReport/project/@files' include='true' />
<firstMatch name='classes' xpath='//coverageReport/project/@classes' include='true' />
<firstMatch name='members' xpath='//coverageReport/project/@members' include='true' />

<!-- email, etc -->

I've omitted the metrics for NDepend/SourceMonitor/VIL, as I haven't fully integrated these tools into my build reports. I may revisit this later.

If you've found this useful or have other cool tools or metrics you want to share, please leave a note.

Happy Canada Day!

submit to reddit


martha said...

I recently came accross your blog and have been reading along. I thought I would leave my first comment. I dont know what to say except that I have enjoyed reading. Nice blog. I will keep visiting this blog very often.



bryan said...

Thanks martha/susan