In the last post, I described a coding mechanism to reduce clutter in C# code, and it excited a fair bit of comment amongst my co-workers.
This is a desired outcome - conversation is a good thing, because we learn to see things from different perspectives.
It's important to note that the concerns raised in these conversations are valid ones - they are raised by skilled engineers with a great deal of experience dealing with real engineering issues, so it's only fair to ensure that these concerns are properly addressed whenever we write any code
What I wrote about the last time was how to use compositional styles of programming to reduce clutter. In this post, I'd like to write a little bit about why we want to consider this style of coding. In doing so, perhaps I can address some of the major concerns that my colleagues brought up in conversation...
It is a well known fact that a very large proportion of the code that we write is developed over a short period, but used over a long period of time. Another way of saying this might be that we write code once, but read it many times.
Additionally, we write for two different audiences concurrently:
So making the code readable is of paramount importance.
At first blush, having all the code in one place seems like a good way to go:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39:
Indeed, most of the feedback I've received - from reader and colleague alike - indicates that this is the most familiar form of coding in their codebases.
However, whilst it's easy to see all that is being done, it's actually very difficult to see what is being done in this code. Under what conditions do we create a
commandSet? We need to trace through the
try-catch levels, keeping track of which objects have been successfully created so far, and maintain a mental map of navigation to deduce the conditions that need to be satisfied. Add a couple more nested levels and increase the length of the function beyond a screenful, and now it's a day job to read the code and understand the nuances of the flow.
I would assert in this case that the intent of the code is actually obscured by the clutter, and I would say that it puts more of a burden on the reader to discover this intent - risking the very real possibility of mis-communication.
Instead of explicitly interleaving the intent and the infrastructure patterns, a better way might be to introduce a layer of abstraction which surfaces the intent and makes it obvious, whilst implicitly applying the infrastructural patterns.
A lot of readers commented that they would be satisfied with code like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45:
This is, indeed, more readable - and it can be argued that enough clutter has been abstracted away to make the intent clear to the reader. I agree. In fact, if readability was the only concern I was addressing, I might have stopped here too!
Howerver, it's difficult to argue that the equivalent code, written LINQ style, is less readable:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:
I agree that the
from-in syntax is somewhat non-intuitive at first, but this pattern of coding in another language like F# or Scala would look very familiar indeed
1: 2: 3: 4: 5:
On the contrary, it would be reasonable to say that both the LINQ and F# examples surface the intent of the writer as clearly as possible!
With imperative programming languages, it's usually convenient to have a debugging environment to allow us to observe the execution of code. "Stepping" through the code in a debugger is second-nature to us, and is a valuable tool in discovering why something is failing.
What happens when we use LINQ, though?
Because most of us encountered LINQ in a LINQ-to-SQL or LINQ-to-EF context, where de-sugaring the query syntax would lead to a bewildering morass of operations on LINQ Expressions, it's easy to see why the perception that LINQ is not debuggable came about.
However, nothing could be further from the truth if we had access to our own
Bind function, as in this code:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:
In the previous blog post, I alluded to the fact that this
Bind function forms a kind of "programmatic semi-colon". In practice, this means that every function in the LINQ chain is connected to its predecessor through the
Consequently, sticking a single breakpoint on
Bind will cause any failing function to be caught. In one place!
Indeed, this is in fact one of the reasons why we would want to progress away from code like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
This kind of code requires us to quadratically close in on a failure (or step through the code) to see where it failed.
In contrast, think of putting a single breakpoint on
Bind as equivalent to putting a breakpoint on every semi-colon, and you will convince yourself that debuggability is not an issue at all when you write this kind of code!
Another problem with code that is interspersed within infrastructure patterns is that it's very easy to forget to apply the pattern to a bit of code.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
This is real code that I picked up from a public code repository, and it's quite typical of the code I've seen in many code-bases.
One would require a fair bit of context to deduce what is going on here, but basically there's a string that needs to be converted from wide characters to multi-byte, and the code for that is interleaved into its point of use.
You might abstract out such a function for this purpose. This is good both for reasons of reuse and readability, but the real benefit of this is that you can now make that function testable. This is important if you notice that the call to
GlobalAlloc may fail. The code-writer probably wanted to get on with writing the code for the happy-case (this is the intent), and actually forgot to put in the check for the failure-case (this should be infrastructure). Now the failure may only be discovered when the system crashes randomly at some point (say under memory-pressure).
Good engineering practice will want to ensure that the error-checking is present, and also that the code has been properly tested under a variety of simulated conditions to elicit such failures.
The compositional style of coding reinforces this coding habit because you have to first build functions and then chain them together - and you can independently write unit-tests for those functions.
We alluded to the ablity to introduce cross-cutting concerns such as managing exceptions and logging in a single place in the
Bind function. Given that our needs may evolve over time, having the ability to introduce such concerns quickly and easily makes our code more extensible with the minimum effort required.
We'll talk a little more about this in a future post, as we consider the cost of modifying the code to propagate both a result (the happy case) and an error value (the 'exception'al case) through our compositional pipeline.
I make the assertion that compositional code is both more reusable and maintainable based on the observations made above.
Bindfunction has been developed, it's easy to see the terseness of the composition chain because the infrastructural code patterns simply don't need to be written. The developer can focus on simply writing the business logic code and the tests around the business logic, and leave application of the error checking, logging and other cross-cutting concerns to the
The big question that everyone seems to ask is if this approach sacrifices performance and scalability.
At first blush, it's easy to see that we are implicitly forcing the infrastructural pattern to "wrap" all the function invocations. This is strictly no worse than explictly wrapping the function invocations ourselves modulo an extra function invocation or two. So performance isn't really being sacrificed by writing compositionally - rather it is the safety of implicit wrapping that is sacrificed by writing imperatively.
Scalability is a much more interesting issue to discuss, but again, we can intuitively see that implicitly applying the pattern on to each function can't be any worse than explicitly applying the pattern around the function invocation. The reality, however is quite interesting. We'll talk about monadic composition in a later post, and show how it opens the door for safe, concurrent computations - which potentially allow the compositional approach to be more scalable than the imperative alternative.
So far, we have talked about the relative merits (and demerits) of using the compositional approach versus the imperative one - with regard to the human audience.
Based on the discussion above, it would be fair to say that whilst the compositional approach requires a mind-shift in reading code of the form
from...in...select instead of straightforward assignments in C#, there is no detrimental impact to testability and performance. Indeed we can argue that there are benefits to debuggability and extensibility, since these are benefitted by the single point of control and extension.
Simply based on these considerations, there is a cogent argument to be made about choosing the imperative style - favouring familiarity with the programming idiom over the cost of modification.
However, we also should consider the impact of the choice of coding style to the other audience of the code - the compiler.
In the next post, I intend to lay down the mathematical foundations underneath the compositional style, and show that the compiler can play a significantly better role in ensuring the correctness and reason-ability of our code in the compositional style.
Until then, keep typing!