Showing posts with label Software Design. Show all posts
Showing posts with label Software Design. Show all posts

Sunday, March 11, 2018

On Code Reviews

Recently, a colleague reached out looking for advise on documentation for code reviews. It was a simple question, like so many others that arrive in my inbox phrased as a “quick question” yet don’t seem to have a “quick answer”. It struck me as odd that we didn’t have a template for this sort of thing. Why would we need this and under what circumstances would a template help?

After some careful contemplation, I landed on two scenarios for code review. One definitely needs a template, the other does not.

Detailed Code Analysis

If you’ve been tasked with writing up a detailed analysis of a code-base, I can see benefit for a structured document template. Strangely, I don’t have a template but I’ve done this task many different times. The interesting part of this task is that the need for the document is often to support a business case for change or to provide evidence to squash or validate business concerns. Understanding the driving need for the document will shape how you approach the task. For example, you may be asked to review the code to identify performance improvements. Or perhaps the business has lost confidence in their team’s ability to estimate and they want an outside party to validate that the code follows sound development practices (and isn’t a hot mess).

In general, a detailed analysis is usually painted in broad-strokes with high-level findings. Eg, classes with too many dependencies, insufficient error handling, lack of unit tests, insecure coding practices. As some of these can be perceived as the opinion of the author it’s imperative that you have hard evidence to support your findings. This is where tools like SonarCube shine as they can highlight design and security flaws, potential defects and even suggest how many hours of technical debt a solution has. Some tools like NDepend or JArchitect allow you to write SQL-like queries to find areas that need the most attention. For example, a query to “find highly used methods that have high cyclomatic complexity and low test coverage” can identify high yield pain points.

If I was to have a template for this sort of analysis, it would have:

  • An executive summary that provides an overview of the analysis and how it was achieved
  • A list of the top 3-5 key concerns where each concern has a short concise paragraph with a focus on the business impact
  • A breakdown of findings in key areas:
    • Security
    • Operational Support and Diagnostics
    • Performance
    • Maintainability Concerns (code-quality, consistency, test automation and continuous delivery).

Peer Code Review

If we’re talking about code-review for a pull-request, my approach is very different. A template might be useful here, but it’s likely less needed.

First, a number of linting tools such as FXCop, JSLint, etc can be included in the build pipeline so warnings and potential issues with the code are identified during the CI build and can be measured over time. Members of the team that are aware of these rules will call them out where appropriate in a code-reviews or they’ll set targets on reducing these warnings over time.

Secondly, it’s best to let the team establish stylistic rules and formatting rather than trying to enforce a standard from above. The reasoning for this should be obvious: code style can be a religious war where there is no right answer. If you set a standard from above, you’ll spend your dying breath policing a codebase where developers don’t agree with you. In the end, consistency throughout the codebase should trump personal preference, so if the team can decide like adults which rules they feel are important then they’re less likely to act like children when reviewing each other’s work.

With linting and stylistic concerns out of the way, what should remain is a process that requires all changes to be reviewed by one or more peers, and they should be reading the code to understand what it does or alternative ways that the author hadn’t considered. I’ve always seen code-review as a discussion, rather than policing, which is always more enjoyable.

Friday, October 09, 2015

Init Methods are Crap

This title of this post pretty much sums up how I feel about Init methods. Let’s take a look at an example why.

I’ve been on a soap-box yelling at anyone who’ll listen that “work in the constructor” is perhaps the worst thing you can do as a developer. You’ve drank the cool-aid and you have some fairly important initialization logic that needs to be called, so you move the logic out of the constructor and into a separate method. Your code now looks like this:

public class MenuProvider : IMenuProvider
{
    private Array<MenuItem> _menuItems;
    
    public void Initialize()
    {
        // do work to populate menu items, serialize from file, etc...
        _menuItems = // some value;
    }

    public IEnumerable<MenuItem> MenuItems
    {
        get
        {
            return _menuItems;
        }
    }

    public IEnumerable<MenuItem> InActiveMenuItems
    {
        get
        {
	    return _menuItems.Where(i => i.Active == false);
        }
    }

}

It's a contrived example, but in the above code MenuItems will be null and InActiveMenuItems throws an error unless you call Initialize() first. Certainly, there must be a better design than this? (If you answered put the work in the constructor, please come see me after class, I.. uh,.. have something for you.)

Putting the onus on the developer to call methods in a certain order is a very backward approach. You, the class designer, should make it difficult for people to do things the wrong way. We can take on some of that onerous effort inside the class, something like this should work nicely:

public class MenuProvider : IMenuProvider
{
   private bool _isInitialized;
   private object _lock = new object();
   private Array<MenuItem> _menuItems;

   public IEnumerable<MenuItem> GetMenuItems()
   {
        EnsureInitialized();
    
        return _menuItems;
   }

   public IEnumerable<MenuItem> GetInactiveMenuItems()
   {
        EnsureInitialized();

        return _menuItems.Where(i => i.Active == false);
   }

   private void EnsureInitialized()
   {
        if (!_initialized)
        {
            Initialize();
        }
   }
   
   private void Initialize()
   {
        lock(_lock)
        {
            if (!_initialized)
            {
                // do initialization work
                _initialized = true;
            }
        }
   }
}

In this design, I've made a few minor adjustments. For stylistic purposes, I converted the properties into methods because properties look awkward on interfaces. Properties also aren't a good choice if there's work involved in calculating them, so methods feel better here.

Outside of the cosmetic changes, one of the most notable changes is that my Initialize method is now private. I've also added some plumbing logic to ensure that the Initialize method only does work the first time it's called, and since this is good defensive programming, this might have already existed and shouldn't be a surprise.

What should stand out to you, is that each method is responsible for ensuring that the initialization logic is called on the user's behalf using the EnsureInitialized method before doing any other work. Some developers may grumble that this seems like a simple thing to forget when adding new methods and thus has potential for silly bugs. That’s fair, but because the API for the class has been simplified the tests to catch this should be much easier to write and find.

Alternatively, we can take this approach further by splitting the concerns into smaller parts. One part could read the file contents and the other part could be responsible for serving the data to consumers. This is effectively just moving the Initialize method to another class. Something like this:

public class MenuProvider : IMenuProvider
{
    private MenuReader _reader;

    public MenuProvider(MenuReader reader)
    {
        _reader = reader;
    }

    public IEnumerable<MenuItem> GetMenuItems()
    {
        return _reader.GetAll();
    }

    public IEnumerable<MenuItem> GetInactiveMenuItems()
    {
        return _reader.GetAll().Where(i => i.Active == false);
    }
}

public abstract class MenuReader
{
    public abstract IEnumerable<MenuItem> GetAll();
}

public class StaticFileMenuReader : MenuReader
{
    private Array<MenuItem> _items;

    public IEnumerable<MenuItem> GetAll()
    {
        if (_items != null)
        {
            // do serialization work
        }
        return _items;
    }
}

This small change greatly simplifies the MenuProvider and the extra step to move the serialization logic into another class may seem familiar to others as a Repository pattern. The repository pattern in this example also affords us additional test flexibility as the file-system can be abstracted away with a test-double.

To sum up, it seems a bit odd to design a class that requires methods to be called in a specific order. Initialization in general is something that the class designer should be well equipped to handle, so don’t push this responsibility to others – a few small changes to your class makes your code more usable, responsive and easier to test.

submit to reddit

Tuesday, August 13, 2013

Fix your code with an “On Notice” board

OnNotice

The above comes with thanks to the On Notice Generator, and my board re-iterates a lot of the guidance found on Miško Hevery’s blog. These are code patterns and anti-patterns that I’ve encountered on many projects and have strong feelings against. Some of these are actually on my Dead to Me board, but there wasn’t a online generator.

I think it’s a good habit to start on On Notice board for your project – a list of offending code that should be cleaned-up at some point. Often, these unsightly offenders are large and tightly woven into the fabric of our code so they’re not something that can be fixed in a single refactoring. But by placing these offences on a visible On Notice board, they become goals that can fuel future refactorings.

You might not be able to fix a problem in a single session, but you can add a 2 hour research task to your backlog to understand why it exists. The output of such task might be further research tasks or changes you could introduce to shrink their influence and eventually remove all together. Sometimes I bundle a bunch of these fixing tasks into a refactoring user story, or slip a few into a new feature if they’re related. Over time, the board clears up.

Don’t forget, while you’re making these changes, write a few tests while you’re at it.

Oh, regarding Gluten-free cookies… they look like cookies, but they are most definitely not.

Tuesday, February 19, 2013

Turtles all the way down

A while back I had a heated passionate debate about dependency injection with a colleague. He was concerned about the lifetime of objects and which object had the responsibility of creating the other. While some of his points were valid, I took a position that a lot of that doesn’t matter if you allow the Inversion of Control container to manage this for you – simply push dependencies through the constructor. The argument, comically, sounded a bit like this:

Colleague: “But who creates these dependencies?”

Me: “That doesn’t matter. Put them in the constructor, let the container take care of that.”

Colleague: “Ok, so who creates this object?”

Me: “Again, doesn’t matter. It’s placed in the constructor of it’s parent, along with any other dependencies it needs.”

Colleague: “Arrrg! Who creates that object??”

Me: “Dude! It’s turtles all the way down.”

The reference “turtles all the way down” is to what I believed to be an Iroquois cosmological myth that the earth is flat and rests on the back of a great tortoise, which in turn rests on top of a larger tortoise that is on top of a larger tortoise, etc, etc… all the way down. I recently discovered that source of this expression is unknown and there are many different variants of this theme. The best variant is the Hindu belief that the earth sits on top of a tiger or elephant, which is on top of a tortoise, which is then, well, …uh, suppose we change the subject?

Over the years, I’ve adopted this expression to represent a fundamental lesson I was taught in object oriented programming: “objects should not have a top”. That is, objects should be open ended, allowing them to be extensible or mixed-in to other different applications and implementations. Our objects should be aware of their dependencies but not in control of their implementation or lifetime, and the framework that glues them together should be abstracted away. This idea is echoed by the Open/Closed principle and the Dependency Inversion principle, and it is demonstrated every time we write automated tests – the first consumers of our designs.

In my analogy, turtles all the way down accurately describes how I feel we should design our objects, but to my colleagues credit it doesn’t reflect that at some point there must be a reckoning point – someone must know how to assemble the parts to make a whole. I suppose Aristotle would point to an unmoved mover, responsible for the order of the universe. Douglas Adams would point to the man behind the man that probably has another man behind him. I’d point to some top-level application service that has the configuration that magically wires everything together.

It feels good to know there’s some sort of finality to the infinite regress problem: our application glues it all together.  So maybe now we can catch up on our sleep, knowing that we don’t have to realize infinity before we can close our eyes. At least until we see our application as part of a batch process in a much larger service, hosted in the cloud, resting on the back of a very large turtle.

Happy coding.

Tuesday, November 22, 2011

Static is Dead to Me

The more software I write with a test-first methodology, the more I struggle with the use of singletons and static classes. They’ve become a design smell, and I’ve grown to realize that if given enough care and thought towards a design most static dependencies aren’t needed. My current position is that most singletons are misplaced artefacts without a proper home, and static methods seem like an amateurish gateway to procedural programming.

Despite my obvious knee-jerk loathing for static, in truth there’s nothing really wrong with it -- it’s hard to build any real-world application without the use of some static methods. I continue to use static in my applications but its use is reserved for fundamental top-level application services. All told, there should only be a handful of classes that are accessible as static members.

From a test-driven development perspective, there are several strong arguments against the use of static:

  • Lack of Isolation. When a class uses a static method in another type, it becomes directly coupled to that implementation. From a testing perspective it becomes impossible to test the consuming class without satisfying the requirements of the static dependency. This increases the complexity and fragility of the tests as the implementation details of the static dependency leak into many tests.
  • Side Effects. Static methods allow us to define objects that maintain state that is global in nature. This global state represents a problem from a testing perspective because any state that must be set up for a test fixture must be reset after the test completes. Failure to clean-up the global state can corrupt the environment and lead to side-effects in other tests that depend on this shared state.
  • Inability to run tests in parallel. A fundamental requirement for reliable tests is a predictable, well-known state before and after each test. If tests depend on global state that can be mutated by multiple tests simultaneously then it is impossible to run more than one test at a time. Given the raw computing power of a hyper-threaded, multi-core machine, it seems a crime to design our code that limits testing to only one core.
  • Hidden dependencies. Classes that pull dependencies into them from a static singleton or service-locator creates an API that lies to you. Tests for these classes become unnecessarily complex and increasingly brittle.

An Alternative to Static

Rather than designing objects to be global services that are accessed through static methods, I design them to be regular objects first. There are no static methods or fields. Classes are designed to be thread-safe, but make no assumptions about their lifetime.

This small change means that I expect all interaction to be with an instance of my object rather through a member that is Type specific. This suggests two problems: how will consumers of my class obtain a reference to my object, and how do I ensure that all consumers use the same object?

Obtaining a Reference

Not surprisingly, the problem related to obtaining a reference to my object is easily solved using my favourite Inversion of Control technique, Constructor Injection. While there are many IoC patterns to choose from, I prefer constructor injection for the following reasons:

  • Consuming classes do not have to depend on a specific framework to obtain a reference.
  • Consuming classes are explicit about their required dependencies. This fosters a consistent and meaningful API where objects are assembled in predictable and organized manner rather than randomly consumed.
  • Consuming classes don’t have to worry about which instance or the lifetime of the object they have received. This solves many testing problems as concurrent tests can work with different objects.

The difference between accessing my object through a static member versus an object instance is very subtle, but the main distinction is that using the object reference requires some forethought as the dependency must be explicitly defined as a member of the consuming class.

Obtaining the Same Reference

By forgoing the use of static, we’ve removed the language feature that would simplify the ability to ensure only a single instance of our object is consumed. Without static we need to solve this problem through the structure of our code or through features of our application architecture. Without a doubt it’s harder, but I consider the well structured and tested code worth it (so don’t give up).

Eliminating Singletons through Code Structure:

My original position is that most singletons are simply misplaced artefacts. By this I mean static is used for its convenience rather than to expose the class as a global application service. In these situations it’s far more likely that the static class provides a service that is used in one area of the application graph. I’d argue that with some analysis the abstraction or related classes could be rearranged to create and host the service as an instance thereby negating the need for a singleton.

This approach typically isn’t easy because the analysis requires you to understand the lifetime and relationship of all objects in the graph. For small applications with emergent design, the effort is obtainable and extremely rewarding when all the pieces fit together nicely. The effort for larger applications may require a few attempts to get it right. Regardless of application size, sticking with an inverted dependencies approach will make the problem obvious when it occurs.

An inversion of control container can lessen the pain.

Eliminating Singletons through Architecture:

Perhaps my favourite mechanism for eliminating singletons is to use an Inversion of Control container and configure it with the responsibility of maintaining the object lifetime.

This example shows programmatic registration of a concrete type as a singleton.

private void InitializeGlobalServices(IUnityContainer container)
{
   // configure the container to cache a single instance of the service
   // the first time it is used.
   container.RegisterType<MyServiceProvider>(new ContainerControlledLifetimeManager());
}

The real advantage here is that any object can be made static without rewriting code. As a further optimization, we can also introduce an interface:

private void InitializeGlobalService(IUnityContainer container)
{
   container.RegisterType<IServiceProvider,MyServiceProvider(
       new ContainerControlledLifetimeManager());
}

Conclusion

Somewhere in my career I picked up the design philosophy that “objects should not have a top”, meaning that they should be open-ended in order to remix them into different applications. Eventually the "top" is the main entry-point into the application which is responsible for assembling the objects to create the application.

Dependency Injection fits nicely into this philosophy and in my view is the principle delivery mechanism for loosely coupled and testable implementations. Static however works against this in every regard: it couples us to implementations and limits our capability to test.

The clear winner is dependency injection backed by the power of an inversion of control container that can do the heavy lifting for you. As per my previous post, if you limit usage of the container to the top-level components, life is simple.

Happy coding.

submit to reddit

Friday, August 12, 2011

On Dependency Injection and Violating Encapsulation Concerns

For me, life is greatly simplified when dependencies are inverted and constructor injection is used to provide a clean mechanism to introduce dependencies to a class. It's explicit and easy to test.

Others however argue that some dependencies are private implementation details to the class and external callers shouldn't know about them. The argument is that exposing these dependencies through the constructor violates encapsulation and introduces coupling. This argument is usually coupled with resistance to using interfaces or classes with virtual methods.

From my experience, there are times when composition in the constructor makes sense but there's a fine line when constructor injection should be used. I want to use this post to elaborate on these arguments and provide my perspective on this debate.

Does Constructor Injection violate Encapsulation?

Does exposing the internal dependencies of a class in the constructor expose the implementation details of a class? If you are allowing callers to construct the class directly, then you are most certainly breaking encapsulation as the callers must posses the knowledge of how to construct your class.  However, regardless of the constructor arguments if callers know how to construct your class you are also coupling to a direct implementation and will undoubtedly create a test impediment elsewhere. There are some simple fixes for this (including more constructor injection or another inversion of control technique) which I'll outline later.

So if constructor injection violates encapsulation, when is it safe to use composition in the constructor?

It really depends on the complexity and relationship of the subject to its dependencies. If the internal dependencies are simple and contain no external dependencies there is likely no harm in using composition to instantiate them in the constructor (the same argument can be made for small static utility methods). For instance, a utility class that performs some translation or other processing activity with limited outcomes may work as a private implementation detail. Problems arise as soon as these dependencies take on further dependencies or produce output that influences the conditional logic of the subject under test. When this happens the argument to keep these as internally controlled dependencies becomes flawed and it may be necessary to upgrade the "private implementation detail" to an inverted dependency.

Before:

This example shows a class that has a private implementation detail that influences logic of the subject under test. The consequence of this design is that the internal details of the validator leak into the test specifications. This leads to very brittle tests as tests must concern themselves with object construction and validation rules -- any change to the object or validation logic will break the tests.

public class MyModelTranslator
{
    public MyModelTranslator()
    {
       _validator = new MyModelValidator();
    }

    public MyViewModel CreateResult(MyModelObject model)
    {
        var result = new MyViewModel();
        
        if (_validator.Validate( model ));
        {
             result.Id = model.Id;
        }
        else {
            result.State = MyResultState.Invalid;
        }
        return result;
    }
}

[Test]
public void WhenCreatingAResult_FromAVaidModelObejct_ShouldHaveSameId()
{
    var subject = new MyModelTranslator();

    // create model object with deep understanding how Id's
    // and expiry dates are related 
    var model = new MyModelObject()
    {
        Id = "S1402011"
        Expiry = Datetime.Parse("12/31/2011");
    };
    var result = subject.CreateResult( model );
    Assert.AreEqual("S1402011", result.Id);
}

After:

This example shows the above example corrected to use a constructor injected dependency that can be mocked. In doing so, the test is not encumbered with object creation or validation details and can easily simulate validate states without introducing brittleness.

public class MyModelTranslator
{
    public MyModelTranslator(MyModelValidator validator);
    {
       _validator = validator;
    }

    // ...
}

[Test]
public void WhenCreatingAResult_FromAVaidModelObejct_ShouldHaveSameId()
{
    var validatorMock = new Mock<MyModelValidator>();

    var subject = new MyModelTranslator(validatorMock.Object);
    var model = new MyModelObject()
    {
      Id = "Dummy"
    };

    // we no longer care how validation is done, 
    // but we expect validation to occur and can easily control
    // and test outcomes
    ValidatorMock.Setup( x => x.Validate( model )).Returns( true );

    var result = subject.CreateResult( model );
    Assert.AreEqual("Dummy", result.Id);
}

Combatting Coupling with Inversion of Control

As the above points out, if we expose the dependencies of a class in the constructor we move the coupling problems out of the class and into the callers. This is not good but it can easily resolved.

An example that shows that our controller class now knows about the MyModelValidator:

public class MyController
{
    public MyController()
    {
        _translator = new MyModelTranslator( new MyModelValidator() );
    }
}

More Constructor Injection

Ironically, the solution for solving coupling and construction problems is more constructor injection. This creates a Russian Doll where construction logic is deferred and pushed higher and higher up the stack resulting in a top level component which is responsible for constructing the entire object graph. While this provides an intuitive API that clearly outlines dependencies, it's tedious to instantiate by hand. This is where dependency injection  frameworks like Unity, StructureMap and others come in: they can do the heavy lifting for you. If done right, your application should only have well defined points where the dependency container is used.

public class MyController : IController
{
    public MyController(MyModelTranslator translator)
    {
        _translator = translator;
    }
}

Note that this assumes that the constructor arguments are abstractions -- either interfaces, abstract classes or classes with virtual methods. In effect, by exposing the constructor arguments we trade off the highly sealed encapsulated black box for an open for extensibility model.

Factories / Builders

In some cases, using Constructor Injection everywhere might not be a good fit. For example, if you had a very large and complex object graph, creating everything upfront using constructor injection might represent a performance problem. Likewise, not all parts of the object graph will be used immediately and you may need to defer construction until needed.  In these cases, the good ol' Gang of Four Factory pattern is a handy mechanism to lazy load components.

So rather that construct the entire object graph and pass resolved dependencies as constructor arguments, pass the factory instead and create the lower-level sub-dependencies when needed.

public class MyControllerFactory : IControllerFactory
{
    private IUnityContainer _container;
    
    public MyControllerFactory(IUnityContainer container)
    {
        _container = container;
    }

    public IController Create()
    {
        _container.Resolve<MyController>();
    }
}

Service Location

While Service Location is an effective mechanism for introducing inversion of control, I'm listing it last for a reason. Unlike Constructor Injection, Service Location couples all your code to an intermediate provider. Depending on your application and relative complexity this might be acceptable, but from personal experience it's a very slippery slope -- actually, it would be more appropriate to call it a cliff because removing or replacing the container from an existing codebase can be a massive undertaking.

I see service location as a useful tool when refactoring legacy code towards Constructor Injection. The lower "leaf-nodes" of the object graph can take advantage of constructor injection and the higher nodes can temporarily use service location to create the "leaves". As you refactor, the service locator is removed and replaced with constructor injection, which results in an easier to test and discoverable API. This also provides a pragmatic bottom up refactor approach.

Summary

  • Use "new" when dependencies have no external dependencies or don't influence conditional flow of the subject.  Conversely, use constructor injection when dependencies have additional dependencies.
  • Use an IoC container to construct objects that use constructor injection, but only use the IoC container in high level components that are responsible for construction of the object graph. If constructing the entire object graph is a performance concern, consider encapsulating the container in a Factory that can be used to resolve remaining parts of the object graph when needed.
  • To prevent violating encapsulation concerns, use inversion of control to promote using abstractions instead of concrete dependencies.

submit to reddit

Thursday, December 24, 2009

Twelve Days of Code - Solution Setup

As part of the Twelve Days of Code Challenge, I’m developing a Pomodoro style application and sharing the progress here on my blog.  This post tackles day one: setting up your project.

The initial stage of a project where you are figuring out layers and packaging is a critical part of the process and one that I’ve always found interesting.  From past experience, the small details at this stage can become massive technical debt later if the wrong approach is used, so it’s best to take your time and make sure you’ve crossed all the T’s and dotted the I’s.

Creating the Solution

For this project I’ve chosen to use Visual Studio 2010 Beta 2 and so far the experience has been great.  Visual Studio 2010 is going to reset the standard and bring new levels of developer productivity (assuming they solve some of the stability issues): it’s faster and much more responsive, eats less memory and adds subtle UX refinements that improve developer flow.  To get a better sense and to recreate this feeling, I urge you to load up Visual Studio 2003 and look at the Start page – we’ve come a long way.

The New Project window has a nice overhaul, and we can specify the target framework in the dialog.  Here I’m creating a WPF Application Pomodoro.Shell.  Note that I’m specifying to create a directory for the solution and that the Solution Name and Project Name are different. 

AddSolution 

Normally at this point I would consider renaming the output of the application from “Pomodoro.Shell.exe” to some other simpler name like “pomodoro.exe”.  This is an optional step which I won’t bother with for this application.

Adding Projects

When laying out the solution, the first challenge is determining how many Visual Studio Projects we’ll need, and there are many factors to consider including dependencies, security, versioning, deployment, reuse, etc.

There appears to be a school of thought that believes every component or module should be its own assembly, and I strongly disagree.  Assemblies should be thought of as deployment-units – if the components version and deploy together, its very likely that they should be a single assembly.  As Visual Studio does not handle large number of projects well, it’s always better to start with larger assemblies and separate them later if needed.

For my pomodoro app, I’ve decided to structure the project into two primary pieces, “core” and “shell”, where “core” provides the model of the application and “shell” provides the user-interface specific plumbing.

Add Test Projects

Right from the start of the project, I’m gearing towards how it will be tested.  As such, I’ve created two test projects, one for each assembly.  This allow me to keep the logical division between assemblies.

AddProject

As soon as the projects are created, the first thing I’ll do is adjust the namespaces of the test libraries to match their counterparts.  By extension, the tests are features of the same namespace but they are packaged in a separate assembly because I do not want to deploy them with the application.  I’ve written about this before.

FixNamespaceforTests

Configure common Assembly Properties

Once we’ve settled into a project structure, the next easy win is to configure the projects to share the same the assembly details such as version number, manufacture, copyright, etc.  This is easily accomplished by creating a single file to represent this data, and then linking each project to this file.  At a later step, this file can be auto-generated as part of the build process.

using System.Reflection;

[assembly: AssemblyVersion("1.0.0.0")]
[assembly: AssemblyFileVersion("1.0.0.0")]
[assembly: AssemblyCompany("Bryan Cook")]
[assembly: AssemblyCopyright("Copyright © Bryan Cook 2009")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]

Tip: Link the AssemblyVersion.cs file to the root of each project, then drag it into the Properties folder.

Give the Assemblies a Strong-Name

If your code will ultimately end up on a end-user desktop, it is imperative to give the assembly a strong-name.  We can take advantage of Visual Studio’s built in features to create our strong-name-key (snk) but we’ll also take a few extra steps to ensure that each project has the same key.

  1. Open the project properties.
  2. Click on the Signing tab
  3. Check the “Sign the assembly” checkbox
  4. Choose “<New…>”
  5. Create a key with no password.
  6. Open Windows Explorer and copy the snk file to the root of the solution.
  7. Then for each project:
    1. Check the “Sign the assembly” checkbox
    2. Choose “<Browse…">”
    3. Navigate to the root of the solution and select the snk key.

Note that Visual Studio will copy the snk file to each project folder, though each project will have the same public key.

Designate Friend Assemblies

In order to aid testing, we can configure our Shell and Core assemblies to implicitly trust our test assemblies.  I’ve written about the benefits before, but the main advantage is that I don’t have to alter type visibility for testing purposes.  Since the assemblies have a strong name, the InternalsVisibleTo attribute requires the fully public key.

strong-name-internalsvisibleto

Since all the projects share the same key file, this public token will work for all the projects.  The following shows the InternalsVisibleTo attribute for the Pomodoro.Core project:

[assembly: InternalsVisibleTo("Pomodoro.Core.Tests, PublicKey=" +
"0024000004800000940000000602000000240000525341310004000001000" +
"1003be2b1a7e08d5e14167209fc318c9c16fa5d448fb48fe1f3e22a075787" +
"55b4b1cf4059185d2bd80cc5735142927fbbd3ab6eeebe6ac6af774d5fe65" +
"0a226b87ee9778cb2f6517382102894dc6d62d5a0aaa84e4403828112167a" +
"1012d5b905a37352290e4aa23f987ff2be3ccda3e27a7f7105cf5b05c0baf" +
"3ecbfd2371c0fa0")]

Setup External References

I like to put all the third-party assemblies that are referenced into the project into a “lib” folder at the root of the solution.  At the moment, I’m only referencing Moq for testing purposes.

A note on external references and source control: Team Foundation Server typically only pulls dependencies that are listed directly in the solution file.  While there are a few hacks for this (add each assembly as an existing item in a Solution Folder; or create a class library that contains the assemblies as content), I like to have all my dependencies in a separate folder with no direct association to the Visual Studio solution.  As a result, these references must be manually updated by performing a “Get Latest” from the TFS Source Control Explorer.  If you’ve got a solution for this – spill it, let’s hear your thoughts.

Setup Third-Party Tools

For all third-party tools that are used as part of the build, I like to include these in a “tools” or “etc” folder at the root of the solution.  This approach allows me to bundle all the necessary tools for other developers to allow faster ramp-up.  It adds a bit of overhead when checking things out, but certainly simplifies the build script.

Setup Build Script

There’s a few extra steps I had to take to get my .NET 4.0 project to compile using NAnt.

  1. Download the nightly build of the nant 0.86 beta1.  The nightly build solves the missing SdkInstallRoot build error.
  2. Paige Cook (no relation) has a comprehensive configuration change that needs to be applied to nant.exe.config
  3. Modify Paige’s version numbers from .NET 4.0 beta 1 to beta 2.  (Replace all references of “v4.0.20506” to “v4.0.21006”)

Here's a few points of interest for the build file listed below:

  • I’ve defined a default target “main”.  This allows me to simply execute “nant” in the root solution of the folder and it’ll take care of the rest.
  • The “main” target is solely empty because the real work is the order of the dependencies.  Currently, I’m only specifying “build”, but normally I would specify “clean, build, test”.
<project default="main">

  <!-- VERSION NUMBER (increment before release) -->
  <property name="version" value="1.0.0.0" />
  
  <!-- SOLUTION SETTINGS -->
  <property name="framework.dir" value="${framework::get-framework-directory(framework::get-target-framework())}" />
  <property name="msbuild" value="${framework.dir}\msbuild.exe" />
  <property name="vs.sln" value="TwelveDays.sln" />
  <property name="vs.config" value="Debug" />

  <!-- FOLDERS AND TOOLS -->
  <!-- Add aliases for tools here -->
  
  
  <!-- main -->
  <target name="main" depends="build">
  </target>

  <!-- build solution -->
  <target name="build" depends="version">

    <!-- compile using msbuild -->
    <exec program="${msbuild}"
      commandline="${vs.sln} /m /t:Clean;Rebuild /p:Configuration=${vs.config}"
      workingdir="."
          />

  </target>
    
  <!-- generate version number -->
  <target name="version">
    <attrib file="AssemblyVersion.cs" readonly="false" if="${file::exists('AssemblyVersion.cs')}" />
    <asminfo output="AssemblyVersion.cs" language="CSharp">
      <imports>
        <import namespace="System" />
        <import namespace="System.Reflection" />
      </imports>
      <attributes>
        <attribute type="AssemblyVersionAttribute" value="${version}" />
        <attribute type="AssemblyFileVersionAttribute" value="${version}" />
      </attributes>
    </asminfo>
  </target>

</project>

Next Steps…

In the next post, we’ll look at the object model for our Pomodoro application.

submit to reddit

Tuesday, December 15, 2009

Unity, Dependency Injection and Service Locators

For a while now, I’ve been chewing on some thoughts about the service locators versus dependency injection, but struggled to find the right way to say it.  I sent an email to a colleague today that tried to describe some of the lessons learned from my last project.  It comes close the visceral feelings I have around this subject, but describes it well.  Here it is unaltered:

When you pull dependencies in from the Container, you’re using a Service Locator feature of Unity; When you push dependencies in via the constructor, you’re using Dependency Injection.  There’s much debate over which pattern to use.  When using dependency injection, the biggest advantage is that the relationship between objects is well known.  This helps us understand the responsibilities of each object better, may have better performance (if we’re resolving dependencies a lot) and it makes it easier to write tests that isolate the behavior of the subject under test.

In a project where we’re resolving dependencies at runtime, the true nature of the dependencies between objects is obscured -- developers must read through the source to understand responsibilities.  Each time a change is made to resolve new dependencies from the container, the tests will fail.  Since locating and understanding the responsibility of the dependencies in the source is a complex task, the easy solution is to simply register the missing dependencies in the container and move on.  In most cases, the object that is registered is the concrete implementation which is also susceptible to the same dependency problem.  Ultimately, this compounds the problem until the tests themselves become obscured, vague, incomplete or overly complex.

In contrast, if the dependencies were registered in the constructor, all dependencies are known at compile time.  Any changes made to the subject won’t compile until the tests are updated to reflect the new functionality.  Note that because the object doesn’t use the container, the tests become just like any other simple POCO test.

This is not a silver bullet solution however.  There are times when resolving from the container makes sense – loaders and savers, for example – but even then, the container can be hidden inside a POCO factory. In contrast to service location, the impact of constructor injection is that it may require more work upfront to realize the dependencies, though this can be mitigated in some cases using a TDD methodology where the tests satisfy the responsibilities of the subject under test as it is written.

Comments welcome.

submit to reddit

Thursday, July 23, 2009

Using RuntimeTypeHandles to improve Memory of .NET Apps

It's been a few months since I wrote my article Creating a better wrapper using AOP, and I haven't thought much of it, but a recent comment left by Krzysztof has motivated me to re-open the project.  Krzysztof has an interesting optimization that I want to try, and perhaps I will delve into that in another post, but today I want to expand on my intentions of improving both performance and memory management as mentioned in the caveats of my article.  In truth, most of this article is inspired by the work of Jeffrey Richter's C# via CLR and Vance Morrison's blog, and I've been itching to test out the concept of using runtime type handles for a while.

Some Background

Most developers are led to believe that Reflection is a performance hit, but few understand why.  Here's my attempt to explain it in a nutshell.  I’m neither a Microsoft employee nor expert in this matter, I’m simply paraphrasing and I encourage you to pick up Jeff’s book.  If I’m paraphrasing incorrectly, please let me know.

When your code is executed for the first time, the CLR finds all references to Types in the MSIL in order to JIT compile them.  As part of this process, Assemblies are loaded and Type information is read from the meta-data and put into the AppDomain’s heap using internal structures – mainly pointers and method tables.  The first time a Type’s method is called, it is JIT compiled into native instructions and a pointer refers to these compiled bits.  Nice, light and fast.

Reflection however is different.  By calling the Type’s built-in GetType method, we are forcing the runtime to read the meta-data and potentially construct dozens of managed objects (MethodInfo, FieldInfo, PropertyInfo, etc) and collectively these can be somewhat large data structures.  Often as most reflection methods use strings to identify methods, we’re performing a case-insensitive scan of the meta-data.  When we call those methods our parameters have to be pushed onto the stack as an array of objects.  When compared to compiled native code with pointers, you can see why this is slower.  (Roughly 200x slower)

Clearly, the .NET runtime doesn’t need these large data-structures to execute our code.  Instead we can leverage the same pointers that the CLR uses through their managed equivalents: RuntimeTypeHandle, RuntimeMethodHandle and RuntimeFieldHandle.  At their most basic level, they are the pointers (IntPtr objects) to our Types, without all the additional reflection performance overhead.  Since an IntPtr is basically just a numerical value (a ValueType nonetheless), they use considerably less memory.

An interesting point: an example from Vance’s blog demonstrates that typeof(Foo) is considerably slower than typeof(Foo).TypeHandle because the JIT recognizes you need the pointer and not the Type.

Even more surprising, anObj.GetType() == typeof(string) is faster than (anObj is string) for the exact same JIT optimization.  Amazing.

Using Handles

If your application holds onto Type data for extended periods, it may be worth your while to use handles.  Here’s a few examples of using them.

Get the handle for a Type
[Test]
public void CanGetHandleFromType()
{
    RuntimeTypeHandle handle = typeof(string).TypeHandle;
    Type type = Type.GetTypeFromHandle(handle);

    Assert.AreEqual("A String".GetType(), type);
}
Get the handle for a Method
[Test]
public void CanGetHandleFromMethod()
{
    MethodInfo m1 = typeof(Foo).GetMethod("Do");
    RuntimeMethodHandle handle = m1.MethodHandle;

    MethodInfo m2 = MethodBase.GetMethodFromHandle(handle) as MethodInfo;

    Assert.IsNotNull(m2);
    Assert.AreEqual(m1, m2);
}

Applying Handles

Getting back to my AOP example, I can now create a really simple mapping table for my AOP wrapper that’s lean(er) on memory and will cut down on needless calls to the Reflection API, thus speeding things up considerably.  Note that I’m doing all the Reflection work upfront and then storing only pointers.

Update: After some initial profiling, I wasn’t surprised to discover that loading the MethodInfo by the handle is as expensive as the original GetMethod call.  As such, I’ve opted to store my mapping table as Dictionary<RuntimeMethodInfo,MethodInfo> instead of Dictionary<RuntimeMethodInfo,RuntimeMethodInfo>

public class InvocationMapping
{
    public InvocationMapping(object instance)
    {
        _table = new Dictionary<RuntimeMethodHandle, MethodInfo>();
        _instance = instance;
    }

    public static InvocationMapping CreateMapping<TWrapper>(object target)
    {
        InvocationMapping mapping = new InvocationMapping(target);

        Type wrapperType = typeof(TWrapper);
        var wrapperProperties = wrapperType.GetProperties();

        Type targetType = target.GetType();
        foreach (var property in wrapperProperties)
        {
            MappedFieldAttribute attribute =
                                    property.GetCustomAttributes(typeof(MappedFieldAttribute),true)
                                    .OfType<MappedFieldAttribute>()
                                    .SingleOrDefault(mf => mf.TargetType == targetType);

            if (attribute != null)
            {
                PropertyInfo targetProperty = targetType.GetProperty(attribute.FieldName);

                mapping.Add(property, targetProperty);
            }
        }

        return mapping;
    }

    public Dictionary<RuntimeMethodHandle, MethodInfo> MappingTable
    {
        get { return _table; }
    }

    public void Add(PropertyInfo property, PropertyInfo targetProperty)
    {
        if (property.CanRead && targetProperty.CanRead)
        {
            _table.Add(property.GetGetMethod().MethodHandle, targetProperty.GetGetMethod());
        }
        if (property.CanWrite && targetProperty.CanWrite)
        {
            _table.Add(property.GetSetMethod().MethodHandle, targetProperty.GetSetMethod());
        }
    }

    public bool Supports(RuntimeMethodInfo handle)
    {
          return _table.ContainsKey(handle);
    }

    public object Invoke(RuntimeMethodInfo handle, params object[] args)
    {
         MethodInfo method = _table[handle];
         return method.Invoke(_instance, args);
    }

    private Dictionary<RuntimeMethodHandle, MethodInfo> _table;
    private object _instance;
}

My next post will look at combining this approach with a IProxyBuilderHook and/or ISelectorType.

submit to reddit

Monday, June 22, 2009

C# Multithreading: Blocking Similar Requests

It's an abuse of Moore's Law: as computers get faster, software will get slower.  And while there are many reasons why software feeds on additional resources, whether it's providing a more intuitive user interface, coding for security or simple feature bloat – it’s largely the fault of software developer's assumption that disk, memory and CPU will continue to increase in capacity and power while the cost decreases.

With the advent of multi-core processors, it's evident that software cannot afford to assume that execution will occur on a single thread.  We need to embrace multi-threaded techniques to allow our software to scale with the platform.  Developing for multi-threaded applications requires special consideration.

I want to share a simple code snippet that I've frequently used to block individual threads from executing the same code twice.  It's especially handy in dealing with process intensive or time-sensitive requests where race-conditions can occur. For example, a recent project needed to dynamically generate an image and cache as a file – if requests were allowed to overwrite the file as it was being served by another thread, the results would be disastrous.

The concept uses the "double-checked locking" pattern, which is commonly used for synchronized lazy-initialization, such as creating singletons.  Unlike the singleton pattern, this example assumes there are multiple requests for different objects, so the technique is modified slightly.

A “Double-Checked Lock” Singleton

For those that aren't familiar with the "double-checked locking" pattern, I'll outline here using a Singleton example.  In short, double-checked locking is "Check twice before doing any work, with a lock in the middle."

Note: Some may claim that the thread-safety of the double-checked lock pattern simply cannot be guaranteed.  This is true for environments where the processor or application runtime model can vary (Java, C++, etc).  However the CLR does guarantee thread synchronization.

This example outlines the double-checked lock being used to guarantee that only one singleton is created:

public class MySingleton
{
    private static volatile MySingleton _instance;
    private static readonly _lock = new object();
public static MySingleton Instance { get { // 1: first check if (_instance == null) { // 2: lock a common object // all other threads must wait until the lock is released lock(_lock) { // 3: second check if (_instance == null) { // 4: thread-sensitive work _instance = new MySingleton(); } } // 5: release the lock } return _instance; } } // other methods }

If you've never seen this type of pattern before, it may at first seem awkward (“why are you checking for null so much?”).  Here's a quick run down of how this plays out in a multi-threaded environment:

  1. Thread #1 enters the MySingleton Instance_get method
  2. Thread #2 enters the MySingleton Instance_get method at the exact same time as Thread #1.
  3. Thread #1 determines that the instance is null and proceeds to the next statement which puts a lock on a common object that is available to all threads.  This lock will block all other requests until the lock is removed.
  4. Thread #2 attempts to access the _lock object but cannot because it is currently locked. Execution pauses at this statement.
  5. Thread #1 which is now in a sensitive code region that can only be executed by a single thread, determines that the instance is still null and proceeds to the next statement which performs the thread-sensitive work of creating the singleton.
  6. Thread #1 releases the lock.
  7. Thread #2 is no longer blocked and can access the lock.  It puts a lock on the _lock object to ensure that no other object can access this region.
  8. Thread #2 determines that the thread is not null.  No additional work is performed and it releases the lock.
  9. Thread #1 returns the instance.
  10. Thread #2 returns the instance that thread #1 created.

Using Synchronized-Keys for Similar Requests

As mentioned previously, the pattern above ensures that only a single instance of our object is created.  However, this technique doesn't work well for methods that vary their response based on input because we're locking a single object to create sensitive regions.  By locking only a single object, we create a nasty side-effect that synchronizes all requests to a single thread.

To solve this problem, we need to put our lock on a different object so that we are only blocking similar requests.  We need a key that is synchronized for requests with the same input.

This is where the “SyncKey” comes in:

public sealed class SyncKey : IDisposable 
{ 
    private static Dictionary<string,SyncKey> _inner = new Dictionary<string,SyncKey>(); 

    public static SyncKey Get(string key) 
    { 
        // lock the table to ensure that only one thread can access 
        //    the dictionary at a time 
        lock(_inner) 
        { 
            // if this is the first request for this key, it will not be present 
            //    in the dictionary.  create and store it. 
            if (!_inner.ContainsKey(key)) 
            { 
                SyncKey item = new SyncKey(key); 
                _inner.Add(key, item); 
            } 
            // return the synchronized key 
            return _inner[key]; 
        } 
    }
 
    private static void Remove(SyncKey item) 
    { 
        // lock the table to ensure that only one thread can access 
        //    the dictionary at a time 
        lock(_inner) 
        { 
            // for the request that first instantiated the key, 
            //    the sensitive work is complete and the key can be safely 
            //    removed from the dictionary 
            // for subsequent requests, although the key was used to prevent 
            //    a duplicate request, it no longer exists in the table 
            //    and can be safely ignored. 
            if (_inner.ContainsKey(item.Key)) 
            { 
                _inner.Remove(item.Key); 
            } 
        } 
    } 

    private string _key; 

    SyncKey(string key) 
    { 
        _key = key; 
    } 

    public string Key 
    { 
        get { return _key; } 
    }

    public void Dispose() 
    { 
        Remove(this); 
    } 
}

An example using this strategy:

public class MyObject 
{ 
    public string SensitiveMethod(string inputParameter) 
    { 
        string key = GetKey(inputParameter); 
        // 1: first check 
        string result = GetItemFromCache(key); 
        if (result == null) 
        { 
            // create a sync key for this request 
            using(SyncKey sync = SyncKey.Get(key)) 
            { 
                // 2: lock request-specific object 
                lock(sync.Key) 
                { 
                    // 3: second check 
                    result = GetItemFromCache(key); 
                    if (result == null) 
                    { 
                        // 4: thread sensitive work 
                        result = BuildResult(key); 
                        CacheResult(key,result); 
                    } 
                } // 5: release lock 
            } // dispose sync key 
        } 

        return result; 
    } 
} 

Results

I built a small contrived example that pushes 1000 threads through a fixed set of input using the following three strategies:

  • No locking – all requests are executed without blocking.
  • Lock – Locking on a shared lock object.
  • Sync Lock – Locking occurs on a request specific key.

sync-lock-chart

Some notes on the findings:

  • No locking —The example doesn’t reflect CPU and resource-starvation that would occur by flooding the system with multiple requests.  If it did, it’s likely that the execution time would be several magnitudes longer.
  • Lock – duplicate requests are avoided, but the total execution time is longer since the requests are pinned to a single thread when the shared object is locked.
  • Sync Lock – duplicate requests are avoided, but execution time is shorter because only similar requests are blocked, allowing other requests to be processed.

submit to reddit

Tuesday, February 03, 2009

Creating a better Wrapper using AOP

Recently, I was working on a system that had to use several third-party data sources with different object models.  Since changing their code was out of the question, the obvious workaround was to wrap the third-party objects into a common interface in my codebase.  The resulting code was dead simple, but the monotonous repetition of creating property accessors for several dozen fields left me with a sinking feeling that there ought to be a better way.

public class PersonWrapper : IPerson // REPEAT for each 3rd party object...
{
     public PersonWrapper(CompanyA.Person person) 
     {
          _person = person;
     }

     public string EmailAddress
     {
          get { return _person.Email; }
          set { _person.Email = value; }
     }

    // REPEAT for each Property we want to expose...

     CompanyA.Person _person;     
}

The better way snuck into my head when a colleague found an interface in the third-party library that was similar to our needs.  We only needed a small handful of properties of their interface, so he asked, "Can we inherit only half of an interface, somehow?"  This odd question with a seemingly obvious answer (there isn't a CLR keyword to break interface definitions) pushed me to think about the problem differently.  We can't choose what parts of an interface we want to inherit, but we can choose a different interface and inherit from it dynamicallyNot a trivial task but certainly possible as many frameworks have been dynamically generating classes from interfaces for years.  This technique only solved half of the problem, so I discarded the notion.

Later that night, the solution to the other half seemed obvious: decorate the interface with attributes that describe the relationship to the the third-party types, and use an aspect oriented concept known as call-interception to forward calls on the interface to the mapped fields of the third-party type.  If it sounds complicated, it is -- but fortunately, the tools make it easy to do, and I'll try my best to walk you through it.

AOP 101

For starters, it helps to understand that typically AOP is used to augment the behavior of existing classes by creating a special proxy around objects at runtime.  For the existing code that will use those augmented types, the proxy is identical to the original -- the key difference is that each call to your object passes through a special middleman (known as an interceptor) that can perform additional operations before and after each call.  The most commonly used example is logging the parameters that were passed in or the time that the method took to execute.  This diagram attempts to describe what's happening:

DynamicProxy-AOP

However, a dynamic proxy for an interface is different because there's no existing class, so our interceptor is responsible for doing all the work:

DynamicProxy-Interface

Mapping Interfaces to Third Party Types

So rather than creating a concrete wrapper for each vendor's type, we annotate our interface with the mapping information. The advantage is that we can manage our interface definition and mappings in a single place, making it easy to extend to new fields and other vendors without incurring class explosion.

public interface IPerson
{
    [MappedField("First", typeof(CompanyA.Person))]
    [MappedField("FName", typeof(CompanyB.Contact))]
    string FirstName { get; set; }
    
    [MappedField("Last", typeof(CompanyA.Person))]
    [MappedField("LName", typeof(CompanyB.Contact))]
    string LastName { get; set; }
}

[AttributeUsage(AttributeTargets.Property, AllowMultiple=true, Inherited=true)]
public sealed class MappedFieldAttribute : Attribute
{
    public MappedFieldAttribute(string fieldName, Type targetType)
    {
        FieldName = fieldName;
        TargetType = targetType;
    }
    
    public string FieldName
    {
        get;
        private set;
    }
    
    public Type TargetType
    {
        get;
        private set;
    }
}

Intercepting Calls

Now that our interface contains the information to map to properties in our third-party class, we need to dynamically generate a derived class that implements this interface.  I'm using Castle projects' DynamicProxy, though there are several other ways to do this.  We'll configure the derived class (Proxy) with a custom interceptor that will read information about the method being called, and using some Reflection goodness, it will redirect the incoming call to the third-party object. 

DynamicProxy's interceptor interface is simple. The IInvocation object contains all the information about the incoming call:

public interface IInterceptor 
{ 
    void Intercept(IInvocation invocation); 
} 

If we were creating a proxy for a concrete Type, we'd likely want to pass the call from the proxy onto the target object using the invocation.Proceed() method, but because we're using an interface with no target implementation, we'll have to write the implementation within the Interceptor.  We can implement anything we want though the only constraint is that we must set the invocation.ReturnValue if the method has a return value.

The details of the method being called are represented in the invocation.Method.  When the call is for a property, the Method is the actual accessor method, ie get_<PropertyName> and set_<PropertyName>.  This represents a bit of a gotcha because our attribute definition isn't on the accessor Method, it's on the PropertyInfo, so we have a bit of work to get our custom attributes.

The FieldMapperInterceptor implementation looks like this:

public class FieldMapperInterceptor : IInterceptor
{
    public FieldMapperInterceptor(object target)
    {
        _targetObject = target;
        _targetType = target.GetType();
    }
    
    public void Intercept(IInvocation invocation)
    {
        if (!InterceptProperty(invocation))
        {
            throw new NotSupportedException("This method/property is not mapped.");
        }
    }

    protected bool InterceptProperty(IInvocation invocation)
    {
        MethodInfo method = invocation.Method;
        string methodName = method.Name;

        if (!IsPropertyAccessor(method)) return false;

        string propertyName = methodName.Substring(4);
        bool writeOperation = methodName.StartsWith("set_");

        // get attribute from the Property (not the get_/set_ method)
        PropertyInfo property = method.DeclaringType.GetProperty(propertyName);
        MappedFieldAttribute attribute =
				property.GetCustomAttributes(typeof(MappedFieldAttribute), true)
                                               .OfType<MappedFieldAttribute>
                                               .SingleOrDefault(mf => mf.TargetType == _targetType);
        if (attribute == null) return false;

        // locate the property on the target object
        PropertyInfo targetProperty = _targetType.GetProperty(attribute.FieldName);
        if (targetProperty == null)
        {
            throw new NotSupportedException("Field not found on the target Type.");
        }

        if (writeOperation)
        {
            object propertyValue = invocation.Arguments.Last();
            object[] index = invocation.Arguments.Take(invocation.Arguments.Length -1 ).ToArray(); 
            targetProperty.SetValue(_targetObject, propertyValue, index);
        }
        else
        {
            invocation.ReturnValue = targetProperty.GetValue(_targetObject, invocation.Arguments);
        }

        return true;
    }

    public bool IsPropertyAccessor(MethodInfo method)
    {
        string methodName = method.Name;
        return (methodName.StartsWith("get_") | methodName.StartsWith("set_"));
    }
    
    private object _targetObject;
    private Type _targetType;
}

Caveats

The FieldMapperInterceptor will work with any interface using the MappedFieldAttribute and any third-party object, but there are a few caveats:

  • The implementation is only dealing with Properties, though the approach for methods would be similar (and probably easier).  Maybe I'll post a follow up if there's interest.
  • There's definitely room for performance and memory improvements via caching and RuntimeTypeHandles.
  • Does not support generic parameters.
  • While Castle's DynamicProxy2 is very light weight, there is a cost to instantiating objects. Albeit very minor.

Putting it All Together

With this in place, dynamically graphing our custom interface to our third-party types is a snap:

[Test] 
public void Demo() 
{ 
    ProxyGenerator generator = new ProxyGenerator(); 
    CompanyA.Person actualObject = new CompanyA.Person(); 
    FieldMapperInterceptor interceptor = new FieldMapperInterceptor(actualObject); 
    IPerson person = proxyGenerator.CreateInterfaceWithoutTarget<IPerson>(interceptor); 

    person.FirstName = "Bryan"; 

    Assert.AreEqual(actualObject.FName,person.FirstName); 
}

Comments welcome.

submit to reddit

Friday, August 08, 2008

Legacy Projects: Planning for Refactoring

Over the last few posts, my legacy monolithic project with no unit tests has: configured a build server with statistics reports, empty coverage data, and a set of unit tests for the user interface.  We're now in a really healthy position to introduce some healthy change into our project.  Well... not quite: applying refactoring to an existing project requires a plan with some creative thinking that integrates change into the daily work cycle.

Have a Plan

I can't stress this enough: without a plan, you're just recklessly introducing chaos into your project.  Though it would help to do deep technical audit, the trick is to keep the plan at a really high level.  Your main goal should be to provide management some deliverable, such as a set of diagrams and a list of recommendations.  Each set of recommendations will likely need their own estimation and planning cycle.  Here's an approach you can use to help start your plan:

  • Whiteboard all the components of your solution.  You might want to take several tries to get it right: grouping related components together, etc.  Ask a team member to validate that all parts of the solution are represented.  When you've got a good handle on it, draw it out in Visio.  (I find a whiteboard to be less restrictive at this phase...)
  • Gather feedback on the current design from as many different sources as possible.  Team members may be able to provide pain points about the current architecture and how it has failed in the past; other solution architects may have different approaches or experiences that may lead to a more informed strategy.  Use this feedback to compile a list of faults and code smells that are present in the current code.
  • Set goals for a new architecture.  The pain points outlined by your developers may inspire you; but ideally your new architecture is clean, performs well, requires less code, secure, loosely coupled, easily testable, flexible to change and more maintainable -- piece of cake, right?
  • Redraw the components of your solution under your ideal solution architecture. It can be difficult to look past the limitations of the current design, but don't let that influence your thinking.  When you're done, compare this diagram to the current solution.   Question everything: How are they different?  What are the major obstacles to obtaining this design and how can they be overcome?  What represents the least/most effort?  What are the short versus long term changes?  What must be done together versus independently?  How does your packaging / deployment / build script / configuration / infrastructure need to change?

After this short exercise, you should have a better sense for the amount of changes and the order that they should be done.  The next step is finding a way to introduce these changes into the your release schedule. 

Introducing Change

While documenting your findings and producing a deliverable is key, perhaps the best way to introduce change into the release schedule is the direct route: tell the decision makers your plans.  An informed client/management is your best ally, but you need to speak their language. 

For example, in a project where the user-interface code is tied to service-objects which are tied directly to web-services, it's not enough to state this is an inflexible design.  However, by outlining a cost savings, reduced bugs and quicker time to market by removing a pain point (the direct coupling between UI and Web-Services prevents third parties or remote developers from properly testing their code) they're much more agreeable to scheduling some time to fix things.

For an existing project, it's very unlikely that the client will agree to a massive refactoring such as changing all of the underlying architecture for the UI at the same time.  However, if a business request touches a component that suffers a pain point, you might be able to make a case to fix things while introducing the change.  This is the general theme of refactoring: each step in the plan should be small and isolated so that the impact is minimal.  I like to think of it as a sliding-puzzle.

Introducing change to a project typically gets easier as you demonstrate results.  However, since the first steps to introduce a new design typically requires a lot of plumbing and simultaneous changes, it can be a very difficult sell for the client if these plumbing changes are padded into a simple request.  To ease the transition it might help if you alleviate the bite by taking some of the first steps on your own: either as a proof of concept, or as an isolated example that can be used to set direction for other team members.

Here are a few things you can take on with relatively minor effort that will ease your transition.

Rethink your Packaging

A common problem with legacy projects is the confusion within the code caused by organic growth: classes are littered with multiple disjointed responsibilities, namespaces lose their meaning, inconsistent or complex relationships between assemblies, etc.  Before you start any major refactoring, now is a really good time to establish how your application will be composed in terms of namespaces and assemblies (packages).

Packaging is normally a side effect of solution design and isn't something you consider first when building an application from scratch.  However, for a legacy project where the code already exists, we can look at using packaging as the vehicle for delivering our new solution.  Some Types within your code base may move to new locations, or adopt new namespaces.  I highly recommend using assemblies as an organizational practice: instruct team members where new code should reside and guide (enforce) the development of new code within these locations.  (Just don't blindly move things around: have a plan!)

Recently, Jeffrey Palermo coined the term Onion architecture to describe a layered architecture where the domain model is centered in the "core", service layers are built upon the core, and physical dependencies (such as databases) are pushed to the outer layers.  I've seen a fair amount of designs follow this approach, and a name for it is highly welcomed -- anyone considering a new architecture should take a look at this design.  Following this principle, it's easy to think of the layers or services residing in different packages.

Introduce a Service Locator

A service locator is an effective approach to breaking down dependencies between implementations, making your code more contract-based and intrinsically more testable.  There are lots of different service locator or dependency injection frameworks out there; a common approach is to write your own Locator and have it wrap around your framework of choice. The implementation doesn't need to be too complicated, even just a hashtable of objects will do; the implementation can be upgraded to other technologies, such as Spring.net, Unity, etc.

Perhaps the greatest advantage that a Service Locator can bring to your legacy project is the ability to unhook the User Interface code from the Business Logic (Service) implementations.  This opens the door to refactoring inline user-interface code and user controls.  The fruits of this labor are clearly demonstrated in your code coverage statistics.

Not all your business objects will fit into your service locator right away, mainly because of strong coupling between UI and BL layers (static methods, etc).  Compile a list of services that will need to be refactored, provide a high-level estimate for each one and add them to a backlog of technical debt to be worked on a later date. 

You can move Business Layer objects into the Service Locator by following the following steps:

  • Extract an interface for the Service objects.  If your business logic is exposed as static methods, you'll have some work to convert these to instance methods.  I'll likely have a follow-up post that shows how to perform these types of refactoring using TDD as a safety net -- more later...
  • Register the service with the service locator.  This work will depend on how your Service Locator works, either through configuration settings or through some initiation sequence.
  • Replace the references to the Service object with the newly extracted interface.  If your business logic is exposed using static methods, you can convert the references to the Service object in the calling code to a property.
  • Obtain a reference to the Service object from the Service Locator.  You can either obtain a reference to the object by making an inline request to the Service Locator, or as the point above encapsulate the call in a property.  The latter approach allows you to cache a reference to the service object.

Next steps

Now that you have continuous integration, reports that demonstrate results, unit tests for the presentation layer, the initial ground-work for your new architecture and a plan of attack -- you are well on your way to start the refactoring process of changing your architecture from the inside out.  Remember to keep your backlog and plan current (it will change), write tests for the components you refactor, and don't bite off more than you can chew.

Good luck with the technical debt!

submit to reddit

Thursday, December 21, 2006

Software and the Metropolis

Update: the link has changed to: http://msdn.microsoft.com/en-us/library/aa480026.aspx I attended the VS Live conference back in 2004, where Pat Helland gave this really interesting talk about how IT and Software are experiencing the same trends as the evolution of major cities. The parallels are really interesting. According to Pat, we're in 1880 I've referred to this talk many times, so I'm posting a link to it now. I can't believe that was over two years ago. One of the interesting anticdotes made is that after the railroads were invented (the internet), we needed a way to package our stuff up and send it between places. Hence, the invention of cardboard in 1880. XML is Cardboard. Ever buy a memory stick from dell? The cardboard it shipped in probably weighed more than the RAM stick. The cardboard had your name on it, where it came from, a shipping number, a customs tag on it, etc, etc -- but in the end, the contents of the package were more important than the package.

Monday, December 18, 2006

Geeking with Greg: Talk on eBay architecture

Cool post on e-bay architecture. Vertical scalability is where it's at. Interesting that XSLT and local file storage are used.