So the other day, I started working on a little feature at work.
It involved creating a whole bunch of types in one library and using them in another.
To give us a sense of security that type instances are created appropriately, I started writing a little test, which looked like this:
1: 2: 3: 4: 5:
I started by writing a single test per type-instance, which worked out really well until I needed an instance of
Type A to pass to
Type B's constructor.
Now I had to make a choice:
Type B, sacrifice DRY-ness and repeat the verification for a successful creation of a
Type Band remove the one for
Type A, as the behaviour is tested satisfactorily there, and achieve DRY-ness
Religious wars have been fought over less.
I chose being DRY over the 'one-test-per-type' dictum, and soldiered on!
Now, there are about 30 types we need to test, so choosing to minimize repetition logically landed up giving us a single test, which created and verified the instances in a kind of dependency-order.
And this is what the code looked like at this point
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27:
This practice is so ubiquitous that I know it would not raise the suspicions of most developers. In fact, if the creation code were expensive (e.g if the creation were to be done across the wire or required database access for each instance), selecting this approach would've earned the appreciation of my teammates.
Also, don't get distracted by the
ToFSharpList() calls - they're proof that we're going across to a different library to construct these objects - and an indication that the construction may not be straightforword or inexpensive as a 'new'.
Almost immediately after writing this code, I realized that the instance creation tests weren't going to be so straightforward.
The constructor function might throw, and we would have to have a way of discovering what went wrong.
So now I have another choice to make:
In reality, many development teams would pick the first approach - no one wants to spend all their time engineering their test code when there's Real Work™ to be done!
A second glance removes the second option, as the most dependent object is still going to have a hierarchy with the same depth as the other cases. Also, the repetition would've really started hurting at this point, so we actually pat ourselves on the back that we dodged a bullet by choosing DRY-ness!
So, being somewhat bull-headed, I actually put in the hard work and start putting try-catch blocks around the constructor functions, very deliberately being very careful to ensure all the constructors are protected.
A couple hours later, the code looks like
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53:
Now, this is a right-and-proper mess, and it's going to require serious engineering effort to maintain this.
Unfortunately, code like this is really all too common, and most code-bases have at least some sections that have similar (lack of) organization and (lack of) structure.
Maybe we're getting too comfortable with the clutter to really call it out as an issue. Perhaps this is actually what real-world programming is all about!
So this functional programming bug seems to have bitten everybody! Let's see if it has anything to offer by way of a better solution to this. After all, those functional programmer types keep talking about how little code they write!
C# has the ability to pass in a function to another function as an argument. Maybe we can use that and write a first cut to abstract out the pattern.
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18:
Here we have a generic function which, given a type
T and the constructing function
f, creates an instance of
T within a try-catch, asserts that it isn't null, and returns it. If there is an exception, well, we return a null.
Once we apply that function to our test code, it looks cleaner - something like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45:
We can see that our code looks a lot cleaner straightaway with a good portion of the ugliness abstracted away.
However, since we're using the time-honoured trick of returning null to signify failure, there's still a null check to do before accessing any of these objects. Again, we've gotten quite used to that, so we don't even think of it as clutter any more! With some clever use of the ternary (?) and C# 6.0 'Elvis' (?.) operators, we might even clean up the code some more.
Perhaps we call should it a day here - after all, it looks neater now than it was before!
Now, it's true that putting some of this code in a generic function and passing in a constructor lambda vastly improves the readability of this code. But let's step back a moment and take a long, hard look at what we're really trying to do.
Let's try and talk through our code to a "rubber-duck". That "conversation" might go something like this:
A, and then, if it is constructed correctly, I want to use it to create an instance of
A, I don't want to even bother trying to create a
B. Or, in fact, do anything further at all."
B, and then, if it is constructed correctly, I want to use it to create a
B, I want to short-circuit out just like before..."
Blooks an awful lot like making a
The duck, and yourself, are now lost for words at recognizing this profound insight.
We have discovered a meta-pattern, which looks something like this:
If we now look at the program again, and squint a little, we'll find that all the statements we've written all over the program have the following meta-pattern:
That's right. We've used the ';' so often that we've forgotten its significance. It's always been a way of saying "Please do what follows me after you're done doing what precedes me" - which is just like the operations we want, but only less configurable!
If only we were able to treat our constructor function as a kind of programming primitive, and we had a smarter way, than the humble ';', of gluing together these primitives, then we might be able to follow our familiar programming paradigms whilst staying in control of the clutter in our meta-pattern. We actually might be able to truly write clutter-free code without sacrificing safety or maintainability! Wow!
Think about that for a while, and you'll get a glimpse into the mind of the functional programmer - because that is exactly what they are able to do!
And if you still keep reading, gentle reader, you will be able to do precisely this with almost no fuss at all! In fact, we have had the ability to write this kind of code since the ancient days of C# 3.0!
We like our generic function executor function that we wrote earlier. A lot.
But first, let's notice that there are two pieces that the clutter is related with:
We can use the magic of C# generic types to encapsulate both of these pieces. We'll use the generic class to decorate the bare value returned by the functions, and also to provide a method to encapsulate the clutter.
(There are actually a few other things we could've tried first, and I can discuss them later, but for now, let's take this approach)
Here's the code:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28:
The first thing we realise is that we have engineered a kind of tar-pit.
That is to say, once we stick
value into an instance of
Create<T>, there's no way to get it out. This is intentional for good reasons, as we'll see in a bit.
The other thing is that the curiously-named
Bind function seems to be the only way to consume the stored
value. It is also relentless in its tar-pitted-ness - it gives you access to the value only as a parameter to a lambda, and expects that the lambda returns a tar-pitted value.
It should be obvious why the tar-pit is a good thing, now: If we want to deal with the stored value, we have exactly one way of doing it - through
Bind. And we can hide all the clutter associated with it there in that single entry point!
This is all well and good, but our test code with this approach now looks kind-of weird. The compiler thinks that it is so hideous, in fact, that it refuses to compile it!
This code doesn't even compile!
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21:
Ouch! What have we gotten ourselves into?
Well, the one good thing about it is that it seems to be a lot less cluttered. The
if statements are gone, and the whole thing looks like a chain, which looks like it can compose nicely.
The reason why it doesn't compile is that when we create the
simpleCommand object, the only place it can be used is as an argument to the
TryCatchCommand constructor - it's simply not in scope for the
CommandSet constructor a few lines below. So it looks like we will need a little more C# magic to make things visible.
If you've ever used LINQ queries, however, you'll notice that they seem to have solved that problem. For example:
1: 2: 3: 4: 5: 6: 7:
a associated with db.Authors is available throughout the query. So maybe the LINQ guys actually have a way to automatically capture the outer scope and make it available to inner ones. (If you want to see how it's done, de-sugar that query into a method-chain, and you'll see an ugly-but-effective expansion which uses a trick called 'closures' to pass things to inner scopes).
So how can we appropriate the goodness of LINQ for our own purposes?
The secret is surprisingly simple. Basically the LINQ syntax parses down to a method-chain composed of normal or extension methods, which is a simply brilliant idea. Basically, it means that by supplying some methods - either in the class or as extension methods - we can get the LINQ syntax to comprehend any type we want to write!
Let's dive in and write those functions as extension methods for now:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
So not only have the smart guys in C#-land given us only two extension methods to write, but they even made the second one implementable purely in terms of the first one so all we have to do to reuse it is to rename the type involved.
And the implementation of the first extension method is simply to invoke the
So not only have we managed to contain all the clutter in our
Create<T> class - we've made it possible for the LINQ syntax to give us a sugar-coated way to use it!
We're going to add one more magic function in our extensions class, so we never have to call the class constructor by name - which means that we can write a LINQ query against any type that looks like
Create<T> and not even have to change it to get new behaviour!
Let's call this function
Lift() and its implementation will simply be to call the constructor of
1: 2: 3: 4: 5: 6: 7: 8: 9:
Now, we have entered awesome-ness land! Our test code can be written like this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38: 39: 40: 41: 42: 43: 44: 45: 46: 47: 48: 49: 50: 51: 52: 53: 54: 55: 56: 57: 58: 59: 60: 61: 62: 63: 64: 65: 66: 67: 68: 69: 70: 71: 72: 73: 74: 75: 76: 77:
Now, I'll be the first to grant that "
from <var-name> in <function>" is a pretty non-intuitive way of expressing an assignment, but that's because the C# guys only exposed this syntax to abstract out the clutter of interacting with collections of objects. For the purposes of this exercise, the syntax would read much better if it said "
let <var-name> be <function>" (and indeed, if you used F#, you would say
let! x = f in this context), but it's a small price to pay to get this kind of composability. Think of it as a few spelling mistakes in your programming language, and you'll be golden!
But what have we gained?
Firstly, I count nearly 30 type constructors being tested with no duplication of construction code! Even taking in the weird formatting, this takes only about 75 lines of code.
Furthermore, there is no leaking of objects and scope - variables can't be reset to null after the assert or reassigned to other unchecked values, which is a distinct possibility (and a source of subtle bugs) in the non-composed code.
I'd make the claim that this code is also super-maintainable: No one can forget to put in an assert when they slide in code to test a new class; and most of the diseases associated with cut/paste have been removed.
So protecting the constructor call is good; pulling out all the clutter into a single function is great; using LINQ to get closures and a uniform programming 'syntax' is awesome; but how maintainable and flexible is this approach.
Say we now we want to introduce tracing throughout the code so we know when something threw an exception or failed on creation.
Let's set the context and imagine the amount of work required to inject tracing logs (correctly) throughout the explicit try-catch code. I'm guess it's a couple hours worth of cut/paste/verify, and it doesn't make the ugly code any prettier. I'm not even going to try and put in a code sample for this approach!
Contrast that with this:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23:
So we just add some context to our value, and introduce the tracing code in one location. Our test code changes to:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15:
Et voilà! We have modified our clutter-management class to support tunable tracing and logging with half a dozen code changes, and not sacrificed the elegance of our test code while doing it!
In the next article, I'd like to talk about why this turns out to be a very general and very powerful way of programming!