This is a multi-part series:
ZooKeeper’s “native” client APIs are C and Java. If you’re programming in .NET (or Python, or a few other languages), the docs helpfully point out that some friendlies have programmed clients that “might” work for you. “Might” is frustrating, as is the possibility that the libraries are behind. So we used the Java version anyway, and made it a little more idiomatic .NET. It turns out to be a nice look at how to use Java from .NET, and how to implement Task and IObservable patterns by by hand.
We wanted to use the Java client from .NET. Thanks to the amazing IKVM.NET project, you can run Java libraries directly in the CLR. The ikvmc compiler turns a Java .jar into a .NET .dll and you’re off to the races, using Java objects as if they were .NET objects… because now they are. (It helps that the CLR was such a blatant rip-off of the JVM, so there are relatively few gotchas in this approach.)
Grabbing the ZooKeeper dll and recompiling it for .NET is the first step. See the simple tutorial at the IKVM.NET site. IKVM.NET translates Java primitives (java.lang.String) into .NET primitives (System.String), but it doesn’t replace the base class library that comes with Java, i.e. the JDK. You’ll also need to reference whatever modules ZooKeeper uses, made available with IKVM as a set of compiled dlls, e.g. IKVM.OpenJDK.Core, IKVM.OpenJDK.Util, and the like. Then it’s as easy as
1_zookeeper = new org.apache.ZooKeeper("connect string" ...);
If that’s all you need, then you’re done! But using ZooKeeper from .NET and being happy about it probably means wrapping it up with some more native interfaces, like IObservables and Tasks. Here’s some of what we did.
Reading data from a node on ZooKeeper involves i/o across the network, so the getData API offers an asynchronous version. The Java version involves registering a callback object for when the results are available. But the more .NET-friendly way is to embrace Tasks. The basic API is just
We can implement something that looks more Tasky and is a whole lot easier to use.
I can call it from an async method, and get a consolidated result that is either an exception, the data I wanted (byte) or information that the node is missing. (Check out the Maybe type from the Rxx (Reactive Extensions Extensions) library. As a side note, .NET should have shipped with an Option type like Swift did or F# did. Can we make the F# one standard in the BCL sometime?)
Here’s how we did it:
Let’s take this apart.
First, the asynchronous Java call is void, but we need to return a Task object. With a lot of standard .NET async programming, the Tasks are made for you, either by compiler re-writing, or library calls deep in the guts of the BCL, like FileStream.ReadAsync. In this case we need to make our own. The best way to do this is simply to use a TaskCompletionSource. These are lightweight objects that expose a Task object we can manipulate ourselves when we get the callback.
Second, Java has lambdas now, so perhaps this sort of silliness will end, but the primary method of callbacks in Java code to date is to actually register a callable object that implements some interface, in this case AsyncCallback.DataCallback.
We killed two birds with one stone by inheriting from TaskCompletionSource and implementing AsyncCallback.DataCallback in one object. It’s convenient, but also reduces the number of allocations necessary. We called this object GetDataTask and most of the interesting stuff is in it. All we have to do is make one — which we assume kicks off the action — and return its Task object.
The class constructor is simple, stashing the _zooKeeper object (representing the native Java client API) for later, and kicking off the relevant API call with a callback
nodePath is worth elaborating on. Why is it passed twice? The second call is to a ‘callback context’, a common pattern in languages that don’t support lexical closures/lambdas directly (as Java didn’t until recently). Your callback is called with a bit of ‘context’, so you know why you were being called. This is because you might just register one callback, e.g. a static function, to be called over and over. In our case we don’t technically need any context, as the callback will only happen once on a particular object (GetDataTask) that can be given all the information we need.) In our case, as you’ll see, the GetDataTask.processResult may want the nodePath to construct a useful error exception. But instead of storing yet another reference to the string ourselves and making the object even larger, we just sneak it in to the ZooKeeper callback context.
Now the constructor has finished executing, and we can hand back a live Taskobject.
When the ZooKeeper API completes its work, it will call the processResult function on the registered object, which unpacks its arguments and ‘completes’ the TaskCompletionSource by setting it with a result or exception, which immediately wakes up any person who might be waiting on the task.
This code exposes one awkward difference between Java and .NET: the treatment of enums. Notice the call to intValue() to get the actual integer value of the enumeration for comparison with the actual error code returned. SetResult marks the Task as complete with a result, and SetException marks the Task as failed with a given exception, constructed by using some ZooKeeper library functions.
Voila! A much simpler API that’s more easily composed with other Tasks. It cost us two allocations of very small objects, but this is a case where those allocations are probably trivial compared with the cost of the network all itself, so we don’t worry much.
Next week we’ll look at how IObservable is implemented with the ZooKeeper API. The idea is the same, but it’s a little more involved.