John Azariah's Blog

Pontifications (and pointless ramblings) of a (mostly) functional programmer!

Recent posts






All Posts

Clutter-Free Coding in C# with LINQ - Part 1

An Innocent Beginning

So the other day, I started working on a little feature at work.

It involved creating a whole bunch of types in one library and using them in another.

To give us a sense of security that type instances are created appropriately, I started writing a little test, which looked like this:

1: 
2: 
3: 
4: 
5: 
public void ShouldCreateSimpleCommand()
{
    var simpleCommand = Command.NewSimpleCommand("echo 'Hello, World!'");
    Assert.IsNotNull(simpleCommand);
}

I started by writing a single test per type-instance, which worked out really well until I needed an instance of Type A to pass to Type B's constructor.

Now I had to make a choice:

  1. Follow the tenet of one-test-per-type, and in the test for Type B, sacrifice DRY-ness and repeat the verification for a successful creation of a Type A instance
  2. Sacrifice the tenet of one-test-per-type, write the test for Type B and remove the one for Type A, as the behaviour is tested satisfactorily there, and achieve DRY-ness

Religious wars have been fought over less.

I chose being DRY over the 'one-test-per-type' dictum, and soldiered on!

Now, there are about 30 types we need to test, so choosing to minimize repetition logically landed up giving us a single test, which created and verified the instances in a kind of dependency-order.

And this is what the code looked like at this point

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
public void ObjectCreationTests()
{
    var simpleCommand = Command.NewSimpleCommand("echo 'Hello, World!'");
    Assert.IsNotNull(simpleCommand);

    var tryCatchCommand = new TryCatchCommand(simpleCommand, FSharpOption<Command>.None);
    Assert.IsNotNull(tryCatchCommand);

    var commandSet = new CommandSet(
        new[]
        {
            tryCatchCommand
        }.ToFSharpList(),
        new[]
        {
            simpleCommand
        }.ToFSharpList());
    Assert.IsNotNull(commandSet);

    var localFiles = LocalFiles.NewLocalFiles(
        new[]
        {
            new FileInfo("test_file.txt")
        }.ToFSharpList());
    Assert.IsNotNull(localFiles);
    //...
}

This practice is so ubiquitous that I know it would not raise the suspicions of most developers. In fact, if the creation code were expensive (e.g if the creation were to be done across the wire or required database access for each instance), selecting this approach would've earned the appreciation of my teammates.

Also, don't get distracted by the ToFSharpList() calls - they're proof that we're going across to a different library to construct these objects - and an indication that the construction may not be straightforword or inexpensive as a 'new'.

Playing Catch!

Almost immediately after writing this code, I realized that the instance creation tests weren't going to be so straightforward.

The constructor function might throw, and we would have to have a way of discovering what went wrong.

So now I have another choice to make:

  1. Chuck the problem into the 'too-hard' basket: Wrap the single test function in a top-level try-catch, and simply protect the test from blowing up. Debug individual constructor failures the hard way.
  2. Regret the choice to have a single test function: Move the code into 30 little test functions; Wrap each constructor invocation with a try-catch; Hope that the dependency hierarchy isn't too deep and unwieldy.
  3. Bite the bullet: Keep the single test function; Accept the nested hierarchy and carefully construct the try-catch wrappers so that objects are in scope where they need to be; Refactor as much as possible.

In reality, many development teams would pick the first approach - no one wants to spend all their time engineering their test code when there's Real Work™ to be done!

A second glance removes the second option, as the most dependent object is still going to have a hierarchy with the same depth as the other cases. Also, the repetition would've really started hurting at this point, so we actually pat ourselves on the back that we dodged a bullet by choosing DRY-ness!

So, being somewhat bull-headed, I actually put in the hard work and start putting try-catch blocks around the constructor functions, very deliberately being very careful to ensure all the constructors are protected.

A couple hours later, the code looks like

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
try
{
    var simpleCommand = Command.NewSimpleCommand("echo 'Hello, World!'");
    Assert.IsNotNull(simpleCommand);

    try
    {
        var tryCatchCommand = new TryCatchCommand(simpleCommand, FSharpOption<Command>.None);
        Assert.IsNotNull(tryCatchCommand);

        try
        {
            var commandSet = new CommandSet(
                new[]
                {
                    tryCatchCommand
                }.ToFSharpList(),
                new[]
                {
                    simpleCommand
                }.ToFSharpList());
            Assert.IsNotNull(commandSet);
        }
        catch (Exception e2)
        {
            Assert.Fail(e2.Message);
        }

        //...
    }
    catch (Exception e1)
    {
        Assert.Fail(e1.Message);
    }
}
catch (Exception e0)
{
    Assert.Fail(e0.Message);
}

try
{
    var localFiles = LocalFiles.NewLocalFiles(
        new[]
        {
            new FileInfo("test_file.txt")
        }.ToFSharpList());
    Assert.IsNotNull(localFiles);
}
catch (Exception e0)
{
    Assert.Fail(e0.Message);
}

YUCK!

Now, this is a right-and-proper mess, and it's going to require serious engineering effort to maintain this.

Unfortunately, code like this is really all too common, and most code-bases have at least some sections that have similar (lack of) organization and (lack of) structure.

Maybe we're getting too comfortable with the clutter to really call it out as an issue. Perhaps this is actually what real-world programming is all about!

Getting Func-y

So this functional programming bug seems to have bitten everybody! Let's see if it has anything to offer by way of a better solution to this. After all, those functional programmer types keep talking about how little code they write!

C# has the ability to pass in a function to another function as an argument. Maybe we can use that and write a first cut to abstract out the pattern.

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
public static class TestHelper
{
    public static T CreateSafelyAndTestObject<T>(Func<T> f)
    {
        try
        {
            var result = f();

            Assert.IsNotNull(result);

            return result;
        }
        catch (Exception)
        {
            return default(T);
        }
    }
}

Here we have a generic function which, given a type T and the constructing function f, creates an instance of T within a try-catch, asserts that it isn't null, and returns it. If there is an exception, well, we return a null.

Once we apply that function to our test code, it looks cleaner - something like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
{
    var simpleCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => Command.NewSimpleCommand("echo 'Hello, World!'"));
    if (simpleCommand == null)
    {
        return;
    }

    var tryCatchCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => new TryCatchCommand(simpleCommand, FSharpOption<Command>.None));
    if (tryCatchCommand == null)
    {
        return;
    }

    var commandSet = TestHelper.CreateSafelyAndTestObject(
        () => new CommandSet(
            new[]
            {
                tryCatchCommand
            }.ToFSharpList(),
            new[]
            {
                simpleCommand
            }.ToFSharpList()));
    if (commandSet == null)
    {
        return;
    }

    var localFiles = TestHelper.CreateSafelyAndTestObject(
        () => LocalFiles.NewLocalFiles(
            new[]
            {
                new FileInfo("test_file.txt")
            }.ToFSharpList()));
    if (localFiles == null) 
    {
        return;
    }

    //...
}

We can see that our code looks a lot cleaner straightaway with a good portion of the ugliness abstracted away.

However, since we're using the time-honoured trick of returning null to signify failure, there's still a null check to do before accessing any of these objects. Again, we've gotten quite used to that, so we don't even think of it as clutter any more! With some clever use of the ternary (?) and C# 6.0 'Elvis' (?.) operators, we might even clean up the code some more.

Perhaps we call should it a day here - after all, it looks neater now than it was before!

Talk To The Duck!

Now, it's true that putting some of this code in a generic function and passing in a constructor lambda vastly improves the readability of this code. But let's step back a moment and take a long, hard look at what we're really trying to do.

Let's try and talk through our code to a "rubber-duck". That "conversation" might go something like this:

The duck, and yourself, are now lost for words at recognizing this profound insight.

We have discovered a meta-pattern, which looks something like this:

  1. Execute a function by passing it to our generic function
  2. Execute a null check and possibly short-circuit out
  3. Execute what follows

If we now look at the program again, and squint a little, we'll find that all the statements we've written all over the program have the following meta-pattern:

  1. Execute a statement
  2. ;
  3. Execute what follows

That's right. We've used the ';' so often that we've forgotten its significance. It's always been a way of saying "Please do what follows me after you're done doing what precedes me" - which is just like the operations we want, but only less configurable!

If only we were able to treat our constructor function as a kind of programming primitive, and we had a smarter way, than the humble ';', of gluing together these primitives, then we might be able to follow our familiar programming paradigms whilst staying in control of the clutter in our meta-pattern. We actually might be able to truly write clutter-free code without sacrificing safety or maintainability! Wow!

Think about that for a while, and you'll get a glimpse into the mind of the functional programmer - because that is exactly what they are able to do!

And if you still keep reading, gentle reader, you will be able to do precisely this with almost no fuss at all! In fact, we have had the ability to write this kind of code since the ancient days of C# 3.0!

Preparing to LINQ!

We like our generic function executor function that we wrote earlier. A lot.

But first, let's notice that there are two pieces that the clutter is related with:

  1. Some value - we're going to generate this and null-check it in the context of the clutter.
  2. Some operations - these form the pattern we want to abstract away.

We can use the magic of C# generic types to encapsulate both of these pieces. We'll use the generic class to decorate the bare value returned by the functions, and also to provide a method to encapsulate the clutter.

(There are actually a few other things we could've tried first, and I can discuss them later, but for now, let's take this approach)

Here's the code:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
public class Create<T>
{
    public Create(T value)
    {
        Value = value;
    }

    private T Value { get; set; }

    public Create<R> Bind<R>(Func<T, Create<R>> f)
    {
        try
        {
            if (Value == null)
            {
                return null;
            }

            var result = f(Value);
            Assert.IsNotNull(result);
            return result;
        }
        catch
        {
            return null;
        }
    }
}

The first thing we realise is that we have engineered a kind of tar-pit.

That is to say, once we stick value into an instance of Create<T>, there's no way to get it out. This is intentional for good reasons, as we'll see in a bit.

The other thing is that the curiously-named Bind function seems to be the only way to consume the stored value. It is also relentless in its tar-pitted-ness - it gives you access to the value only as a parameter to a lambda, and expects that the lambda returns a tar-pitted value.

It should be obvious why the tar-pit is a good thing, now: If we want to deal with the stored value, we have exactly one way of doing it - through Bind. And we can hide all the clutter associated with it there in that single entry point!

This is all well and good, but our test code with this approach now looks kind-of weird. The compiler thinks that it is so hideous, in fact, that it refuses to compile it!

This code doesn't even compile!

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
{
    var result = new Create<Command>(Command.NewSimpleCommand("echo 'Hello, World!'"))
        .Bind<TryCatchCommand>(
            simpleCommand => new TryCatchCommand(simpleCommand, FSharpOption<Command>.None))
        .Bind<CommandSet>(
            tryCatchCommand => new CommandSet(
                new[]
                {
                    tryCatchCommand
                }.ToFSharpList(),
                new[]
                {
                    simpleCommand
                }.ToFSharpList()))
        .Bind<LocalFiles>(
            _ => LocalFiles.NewLocalFiles(
                new[]
                {
                    new FileInfo("test_file.txt")
                }.ToFSharpList()));
    }

Ouch! What have we gotten ourselves into?

Well, the one good thing about it is that it seems to be a lot less cluttered. The if statements are gone, and the whole thing looks like a chain, which looks like it can compose nicely.

The reason why it doesn't compile is that when we create the simpleCommand object, the only place it can be used is as an argument to the TryCatchCommand constructor - it's simply not in scope for the CommandSet constructor a few lines below. So it looks like we will need a little more C# magic to make things visible.

If you've ever used LINQ queries, however, you'll notice that they seem to have solved that problem. For example:

1: 
2: 
3: 
4: 
5: 
6: 
7: 
var queryresults = from a in db.Authors                                          
                    join ba in db.Title_Authors                           
                    on a.Au_ID equals ba.Au_ID into idAuthor
                    from c in idAuthor
                    join t in db.Titles  
                    on c.ISBN equals t.ISBN 
                    select new { Author = a.Author1,Title= t.Title1 };

The a associated with db.Authors is available throughout the query. So maybe the LINQ guys actually have a way to automatically capture the outer scope and make it available to inner ones. (If you want to see how it's done, de-sugar that query into a method-chain, and you'll see an ugly-but-effective expansion which uses a trick called 'closures' to pass things to inner scopes).

So how can we appropriate the goodness of LINQ for our own purposes?

The secret is surprisingly simple. Basically the LINQ syntax parses down to a method-chain composed of normal or extension methods, which is a simply brilliant idea. Basically, it means that by supplying some methods - either in the class or as extension methods - we can get the LINQ syntax to comprehend any type we want to write!

Let's dive in and write those functions as extension methods for now:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
public static class CreateExtensions
{
    // this one is *always* written like this, as long as type (like Create<>) has the Bind signature we specified earlier
    public static Create<TR> SelectMany<T, TR>(this Create<T> m, Func<T, Create<TR>> f) => m.Bind(f);
    
    // this one is *always* written like this - it's defined in terms of the other SelectMany overload. 
    // just change the Create<> type name to whatever type name you want!
    public static Create<TV> SelectMany<T, TU, TV>(this Create<T> m, Func<T, Create<TU>> f, Func<T, TU, TV> s)
        => m.SelectMany(x => f(x).SelectMany(y => (Create<TV>) s(x, y)));
}

Astonishing!

So not only have the smart guys in C#-land given us only two extension methods to write, but they even made the second one implementable purely in terms of the first one so all we have to do to reuse it is to rename the type involved.

And the implementation of the first extension method is simply to invoke the Bind function!

So not only have we managed to contain all the clutter in our Create<T> class - we've made it possible for the LINQ syntax to give us a sugar-coated way to use it!

We're going to add one more magic function in our extensions class, so we never have to call the class constructor by name - which means that we can write a LINQ query against any type that looks like Create<T> and not even have to change it to get new behaviour! Let's call this function Lift() and its implementation will simply be to call the constructor of Create<T>.

1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: 
9: 
public static class CreateExtensions
{
    public static Create<T> Lift<T>(this T value) => new Create<T>(value);

    public static Create<TR> SelectMany<T, TR>(this Create<T> m, Func<T, Create<TR>> f) => m.Bind(f);

    public static Create<TV> SelectMany<T, TU, TV>(this Create<T> m, Func<T, Create<TU>> f, Func<T, TU, TV> s)
        => m.SelectMany(x => f(x).SelectMany(y => (Create<TV>) s(x, y)));
}

Now, we have entered awesome-ness land! Our test code can be written like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
46: 
47: 
48: 
49: 
50: 
51: 
52: 
53: 
54: 
55: 
56: 
57: 
58: 
59: 
60: 
61: 
62: 
63: 
64: 
65: 
66: 
67: 
68: 
69: 
70: 
71: 
72: 
73: 
74: 
75: 
76: 
77: 
{
    var created = from simpleCommand in Command.NewSimpleCommand("echo 'Hello, World!'").Lift()
        from parameterizedCommand in
            Command.NewParametrizedCommand(
                new ParametrizedCommand(
                    "echo 'Hello, %user%'",
                    new[]
                    {
                        "user"
                    }.ToFSharpList())).Lift()
        from tryWithCatch in
            new TryCatchCommand(parameterizedCommand, FSharpOption<Command>.Some(simpleCommand)).Lift()
        from tryWithoutCatch in
            new TryCatchCommand(simpleCommand, FSharpOption<Command>.None).Lift()
        from commandSet0 in CommandSet.Zero.Lift()
        from commandSet in new CommandSet(
            new[]
            {
                tryWithCatch
            }.ToFSharpList(),
            new[]
            {
                simpleCommand
            }.ToFSharpList()).Lift()
        from localFiles0 in LocalFiles.Zero.Lift()
        from localFiles in LocalFiles.NewLocalFiles(
            new[]
            {
                new FileInfo("temp.txt")
            }.ToFSharpList()).Lift()
        from uploadedFiles0 in UploadedFiles.Zero.Lift()
        from uploadedFiles in UploadedFiles.NewUploadedFiles(
            new[]
            {
                new ResourceFile("blobSource", "blobPath")
            }.ToFSharpList()).Lift()
        from workloadUnitTemplate0 in WorkloadUnitTemplate.Zero.Lift()
        from workloadUnitTemplate in new WorkloadUnitTemplate(commandSet, localFiles, false).Lift()
        from workloadArguments0 in WorkloadArguments.Zero.Lift()
        from workloadArguments in
            WorkloadArguments.NewWorkloadArguments(
                new Dictionary<string, FSharpList<string>>
                {
                    {
                        "users", new[]
                        {
                            "john",
                            "paul",
                            "george",
                            "ringo"
                        }.ToFSharpList()
                    }
                }.ToFSharpMap()).Lift()
        from workloadSpecification in new WorkloadSpecification(
            new[]
            {
                workloadUnitTemplate
            }.ToFSharpList(),
            LocalFiles.Zero,
            workloadArguments).Lift()
        from taskName in TaskName.NewTaskName("simple-task").Lift()
        from taskArguments0 in TaskArguments.Zero.Lift()
        from taskArguments in TaskArguments.NewTaskArguments(
            new Dictionary<string, string>
            {
                {"name", "john"}
            }.ToFSharpMap()).Lift()
        from defaultTaskSpecification in TaskOperations.GetDefaultTaskSpecification(taskName).Lift()
        from jobName in JobName.NewJobName("simple-job").Lift()
        from nullJobPriority in JobPriority.NewJobPriority(null).Lift()
        from jobPriority in JobPriority.NewJobPriority(10).Lift()
        from defaultJobSpecification in JobOperations.GetDefaultJobSpecification(jobName).Lift()
        from result in "done".Lift()
        select result;

    Assert.IsNotNull(created);
}

Now, I'll be the first to grant that "from <var-name> in <function>" is a pretty non-intuitive way of expressing an assignment, but that's because the C# guys only exposed this syntax to abstract out the clutter of interacting with collections of objects. For the purposes of this exercise, the syntax would read much better if it said "let <var-name> be <function>" (and indeed, if you used F#, you would say let! x = f in this context), but it's a small price to pay to get this kind of composability. Think of it as a few spelling mistakes in your programming language, and you'll be golden!

But what have we gained?

Firstly, I count nearly 30 type constructors being tested with no duplication of construction code! Even taking in the weird formatting, this takes only about 75 lines of code.

Furthermore, there is no leaking of objects and scope - variables can't be reset to null after the assert or reassigned to other unchecked values, which is a distinct possibility (and a source of subtle bugs) in the non-composed code.

I'd make the claim that this code is also super-maintainable: No one can forget to put in an assert when they slide in code to test a new class; and most of the diseases associated with cut/paste have been removed.

A final twist

So protecting the constructor call is good; pulling out all the clutter into a single function is great; using LINQ to get closures and a uniform programming 'syntax' is awesome; but how maintainable and flexible is this approach.

Say we now we want to introduce tracing throughout the code so we know when something threw an exception or failed on creation.

Let's set the context and imagine the amount of work required to inject tracing logs (correctly) throughout the explicit try-catch code. I'm guess it's a couple hours worth of cut/paste/verify, and it doesn't make the ugly code any prettier. I'm not even going to try and put in a code sample for this approach!

Contrast that with this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
public class Create<T>
{
    public Create(T value, string name)
    {
        ...
        
        Name = name;
        if (!(string.IsNullOrWhiteSpace(Name)))
        {
            Console.WriteLine($"{Name} was created with value {Value}.");
        }
    }

    private string Name { get; set; }
    
    ...
}

public static class CreateExtensions
{
    public static Create<T> Lift<T>(this T value, string name = null) => new Create<T>(value, name);
    ...
}

So we just add some context to our value, and introduce the tracing code in one location. Our test code changes to:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
var created = from simpleCommand in Command.NewSimpleCommand("echo 'Hello, World!'").Lift("simple-command")
    from parameterizedCommand in
        Command.NewParametrizedCommand(
            new ParametrizedCommand(
                "echo 'Hello, %user%'",
                new[]
                {
                    "user"
                }.ToFSharpList())).Lift()
    from tryWithCatch in
        new TryCatchCommand(parameterizedCommand, FSharpOption<Command>.Some(simpleCommand)).Lift(
            "try-with-catch")
    from tryWithoutCatch in
        new TryCatchCommand(simpleCommand, FSharpOption<Command>.None).Lift("try-without-catch")
    ...

Et voilà! We have modified our clutter-management class to support tunable tracing and logging with half a dozen code changes, and not sacrificed the elegance of our test code while doing it!

In the next article, I'd like to talk about why this turns out to be a very general and very powerful way of programming!

Keep typing!!