John Azariah's Blog

Pontifications (and pointless ramblings) of a (mostly) functional programmer!

Recent posts






All Posts

Clutter-Free Coding in C# with LINQ - Part 2

Motivations

In the last post, I described a coding mechanism to reduce clutter in C# code, and it excited a fair bit of comment amongst my co-workers.

This is a desired outcome - conversation is a good thing, because we learn to see things from different perspectives.

It's important to note that the concerns raised in these conversations are valid ones - they are raised by skilled engineers with a great deal of experience dealing with real engineering issues, so it's only fair to ensure that these concerns are properly addressed whenever we write any code

What I wrote about the last time was how to use compositional styles of programming to reduce clutter. In this post, I'd like to write a little bit about why we want to consider this style of coding. In doing so, perhaps I can address some of the major concerns that my colleagues brought up in conversation...

Readability

It is a well known fact that a very large proportion of the code that we write is developed over a short period, but used over a long period of time. Another way of saying this might be that we write code once, but read it many times.

Additionally, we write for two different audiences concurrently:

  1. The immediate target of our programming is, of course, the compiler. We'll talk a little more about this later, but it's important to note that the compiler is a tool that is capable of making life easy for us. We don't have to go through great lengths - for most of what we do, anyway - to make things simpler for the compiler.
  2. The other, arguably more important, audience is the human team that reads, uses and maintains the code. This audience requires that we take great care to communicate intent, method and context through the code, and it matters a great deal that we take as much effort as possible to make things simple for the humans. Not only could this include humans who could be geographically separated from us, which raises the bar for communication across cultural boundaries, but it could also include humans who are temporally separated from us - our future selves and colleagues, who also need to be able to clearly grasp our intent in writing the code, the context in which the code is written, and the methods we use - quite possibly without any of the original authors to provide this context.

So making the code readable is of paramount importance.

At first blush, having all the code in one place seems like a good way to go:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
try
{
    var simpleCommand = Command.NewSimpleCommand("echo 'Hello, World!'");
    Assert.IsNotNull(simpleCommand);

    try
    {
        var tryCatchCommand = new TryCatchCommand(simpleCommand, FSharpOption<Command>.None);
        Assert.IsNotNull(tryCatchCommand);

        try
        {
            var commandSet = new CommandSet(
                new[]
                {
                    tryCatchCommand
                }.ToFSharpList(),
                new[]
                {
                    simpleCommand
                }.ToFSharpList());
            Assert.IsNotNull(commandSet);
        }
        catch (Exception e2)
        {
            Assert.Fail(e2.Message);
        }

        //...
    }
    catch (Exception e1)
    {
        Assert.Fail(e1.Message);
    }
}
catch (Exception e0)
{
    Assert.Fail(e0.Message);
}

Indeed, most of the feedback I've received - from reader and colleague alike - indicates that this is the most familiar form of coding in their codebases.

However, whilst it's easy to see all that is being done, it's actually very difficult to see what is being done in this code. Under what conditions do we create a commandSet? We need to trace through the try-catch levels, keeping track of which objects have been successfully created so far, and maintain a mental map of navigation to deduce the conditions that need to be satisfied. Add a couple more nested levels and increase the length of the function beyond a screenful, and now it's a day job to read the code and understand the nuances of the flow.

I would assert in this case that the intent of the code is actually obscured by the clutter, and I would say that it puts more of a burden on the reader to discover this intent - risking the very real possibility of mis-communication.

Instead of explicitly interleaving the intent and the infrastructure patterns, a better way might be to introduce a layer of abstraction which surfaces the intent and makes it obvious, whilst implicitly applying the infrastructural patterns.

A lot of readers commented that they would be satisfied with code like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
29: 
30: 
31: 
32: 
33: 
34: 
35: 
36: 
37: 
38: 
39: 
40: 
41: 
42: 
43: 
44: 
45: 
{
    var simpleCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => Command.NewSimpleCommand("echo 'Hello, World!'"));
    if (simpleCommand == null)
    {
        return;
    }

    var tryCatchCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => new TryCatchCommand(simpleCommand, FSharpOption<Command>.None));
    if (tryCatchCommand == null)
    {
        return;
    }

    var commandSet = TestHelper.CreateSafelyAndTestObject(
        () => new CommandSet(
            new[]
            {
                tryCatchCommand
            }.ToFSharpList(),
            new[]
            {
                simpleCommand
            }.ToFSharpList()));
    if (commandSet == null)
    {
        return;
    }

    var localFiles = TestHelper.CreateSafelyAndTestObject(
        () => LocalFiles.NewLocalFiles(
            new[]
            {
                new FileInfo("test_file.txt")
            }.ToFSharpList()));
    if (localFiles == null) 
    {
        return;
    }

    //...
}

This is, indeed, more readable - and it can be argued that enough clutter has been abstracted away to make the intent clear to the reader. I agree. In fact, if readability was the only concern I was addressing, I might have stopped here too!

Howerver, it's difficult to argue that the equivalent code, written LINQ style, is less readable:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
{
    var created = 
        from simpleCommand in Command.NewSimpleCommand("echo 'Hello, World!'").Lift()
        from tryWithCatch in new TryCatchCommand(simpleCommand, FSharpOption<Command>.None).Lift()
        from commandSet in new CommandSet(
            new[]
            {
                tryWithCatch
            }.ToFSharpList(),
            new[]
            {
                simpleCommand
            }.ToFSharpList()).Lift()
       ...

I agree that the from-in syntax is somewhat non-intuitive at first, but this pattern of coding in another language like F# or Scala would look very familiar indeed

1: 
2: 
3: 
4: 
5: 
succeed {
    let! simpleCommand = SimpleCommand "echo 'Hello, World!'"
    let! tryWithCatch = {Try = simpleCommand; Catch = None }
    let! commandSet = { TryCatchCommands = [tryWithCatch]; FinallyCommands = [simpleCommand] }
    ...

On the contrary, it would be reasonable to say that both the LINQ and F# examples surface the intent of the writer as clearly as possible!

Debuggability

With imperative programming languages, it's usually convenient to have a debugging environment to allow us to observe the execution of code. "Stepping" through the code in a debugger is second-nature to us, and is a valuable tool in discovering why something is failing.

What happens when we use LINQ, though?

Because most of us encountered LINQ in a LINQ-to-SQL or LINQ-to-EF context, where de-sugaring the query syntax would lead to a bewildering morass of operations on LINQ Expressions, it's easy to see why the perception that LINQ is not debuggable came about.

However, nothing could be further from the truth if we had access to our own Bind function, as in this code:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
19: 
20: 
21: 
22: 
23: 
24: 
25: 
26: 
27: 
28: 
public class Create<T>
{
    public Create(T value)
    {
        Value = value;
    }

    private T Value { get; set; }

    public Create<R> Bind<R>(Func<T, Create<R>> f)
    {
        try
        {
            if (Value == null)
            {
                return null;
            }

            var result = f(Value);
            Assert.IsNotNull(result);
            return result;
        }
        catch
        {
            return null;
        }
    }
}

In the previous blog post, I alluded to the fact that this Bind function forms a kind of "programmatic semi-colon". In practice, this means that every function in the LINQ chain is connected to its predecessor through the Bind function.

Consequently, sticking a single breakpoint on Bind will cause any failing function to be caught. In one place!

Indeed, this is in fact one of the reasons why we would want to progress away from code like this:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
16: 
17: 
18: 
{
    var simpleCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => Command.NewSimpleCommand("echo 'Hello, World!'"));
    if (simpleCommand == null)
    {
        return;
    }

    var tryCatchCommand =
        TestHelper.CreateSafelyAndTestObject(
            () => new TryCatchCommand(simpleCommand, FSharpOption<Command>.None));
    if (tryCatchCommand == null)
    {
        return;
    }
    
    ...

This kind of code requires us to quadratically close in on a failure (or step through the code) to see where it failed.

In contrast, think of putting a single breakpoint on Bind as equivalent to putting a breakpoint on every semi-colon, and you will convince yourself that debuggability is not an issue at all when you write this kind of code!

Testability

Another problem with code that is interspersed within infrastructure patterns is that it's very easy to forget to apply the pattern to a bit of code.

Consider:

 1: 
 2: 
 3: 
 4: 
 5: 
 6: 
 7: 
 8: 
 9: 
10: 
11: 
12: 
13: 
14: 
15: 
CString sFileSize;

BSTR bsFileSize;
pHtmlDoc2->get_fileSize(&bsFileSize);

long nLen = WideCharToMultiByte(CP_ACP, 0, bsFileSize, -1, NULL, 0, NULL, NULL);
if (nLen != 0)
{
    HANDLE hMem = GlobalAlloc(GMEM_FIXED, nLen);
    char *pMem = (char *) GlobalLock(hMem);
    WideCharToMultiByte(CP_ACP, 0, bsFileSize, -1, pMem, nLen, NULL, NULL);
    sFileSize = pMem;
    GlobalUnlock(hMem);
    GlobalFree(hMem);
}

This is real code that I picked up from a public code repository, and it's quite typical of the code I've seen in many code-bases.

One would require a fair bit of context to deduce what is going on here, but basically there's a string that needs to be converted from wide characters to multi-byte, and the code for that is interleaved into its point of use.

You might abstract out such a function for this purpose. This is good both for reasons of reuse and readability, but the real benefit of this is that you can now make that function testable. This is important if you notice that the call to GlobalAlloc may fail. The code-writer probably wanted to get on with writing the code for the happy-case (this is the intent), and actually forgot to put in the check for the failure-case (this should be infrastructure). Now the failure may only be discovered when the system crashes randomly at some point (say under memory-pressure).

Good engineering practice will want to ensure that the error-checking is present, and also that the code has been properly tested under a variety of simulated conditions to elicit such failures.

The compositional style of coding reinforces this coding habit because you have to first build functions and then chain them together - and you can independently write unit-tests for those functions.

Extensibility

We alluded to the ablity to introduce cross-cutting concerns such as managing exceptions and logging in a single place in the Bind function. Given that our needs may evolve over time, having the ability to introduce such concerns quickly and easily makes our code more extensible with the minimum effort required.

We'll talk a little more about this in a future post, as we consider the cost of modifying the code to propagate both a result (the happy case) and an error value (the 'exception'al case) through our compositional pipeline.

Reusability & Maintainability

I make the assertion that compositional code is both more reusable and maintainable based on the observations made above.

  1. The intent of the code is abstracted up and away from the infrastructural code. This makes it more maintainable because the behaviour of the code is made more evident.
  2. The code is split up into reasonable functions that can be independently tested. This makes the both maintainable and reusable.
  3. The infrastructural patterns can be easily extended and more "bullet-proofing" added - in a single place. The benefits of these improvements are immediately applied to all the functions that are composed in this manner. This makes the code more maintainable.
  4. There's less code to write. Once the class with the Bind function has been developed, it's easy to see the terseness of the composition chain because the infrastructural code patterns simply don't need to be written. The developer can focus on simply writing the business logic code and the tests around the business logic, and leave application of the error checking, logging and other cross-cutting concerns to the Bind function.

Performance & Scalability

The big question that everyone seems to ask is if this approach sacrifices performance and scalability.

At first blush, it's easy to see that we are implicitly forcing the infrastructural pattern to "wrap" all the function invocations. This is strictly no worse than explictly wrapping the function invocations ourselves modulo an extra function invocation or two. So performance isn't really being sacrificed by writing compositionally - rather it is the safety of implicit wrapping that is sacrificed by writing imperatively.

Scalability is a much more interesting issue to discuss, but again, we can intuitively see that implicitly applying the pattern on to each function can't be any worse than explicitly applying the pattern around the function invocation. The reality, however is quite interesting. We'll talk about monadic composition in a later post, and show how it opens the door for safe, concurrent computations - which potentially allow the compositional approach to be more scalable than the imperative alternative.

Correctness and "Reasonability"

So far, we have talked about the relative merits (and demerits) of using the compositional approach versus the imperative one - with regard to the human audience.

Based on the discussion above, it would be fair to say that whilst the compositional approach requires a mind-shift in reading code of the form from...in...select instead of straightforward assignments in C#, there is no detrimental impact to testability and performance. Indeed we can argue that there are benefits to debuggability and extensibility, since these are benefitted by the single point of control and extension.

Simply based on these considerations, there is a cogent argument to be made about choosing the imperative style - favouring familiarity with the programming idiom over the cost of modification.

However, we also should consider the impact of the choice of coding style to the other audience of the code - the compiler.

In the next post, I intend to lay down the mathematical foundations underneath the compositional style, and show that the compiler can play a significantly better role in ensuring the correctness and reason-ability of our code in the compositional style.

Until then, keep typing!

union case Option.None: Option<'T>