This is a multi-part series:

1. ephemeral nodes, 2. observable<zookeeper>, 3. where is zookeeper?, 4. .net api and tasks, 5. c# iobservable

What is Zookeeper?

Let's assume they expended their brainpower on the program, not the logo

Zookeeper is a distributed database originally developed as part of the Hadoop project. It’s spawned several imitators: Consuletcd, and Doozerd (itself a clone of chubby.) A lot of the material out there about Zookeeper describes just how it works, not necessarily what you’d use it for. In this series of posts, we’ll cover how we used it at one client — and it how it also got abused.

Service Registry & Discovery

The canonical usage of the Zookeeper system is to maintain a service registry. In our application, we had large datasets that would be hosted by data servers. (Each server might be backed by a cluster of machines.) Clients requested visualizations of these datasets. To do so, they would need to connect to the server hosting the data they were interested in. But how does a client know which data servers are running? In this picture we imagine a system with two such data servers, each having registered with Zookeeper. When the server starts, it places a document in a pre-arranged place (/services/{dataset}), describing where it can be found. The document might be JSON, e.g. { dataset: 'X', host:'abc', port:'123' }.

Zookeeper acting as a service registry
Zookeeper acting as a service registry

This arrangement takes care of a couple of things nicely for us

  • Service Discovery. Client A can connect to Zookeeper, do a getChildren("/services"), and quickly discover which servers are available for use. Each will be a small document which allows it to connect
  • Service Management. Upon starting, the data server can attempt to insert a document describing itself (/services/X). If such a document already exists, an error will result, ensuring that two servers do not try to spin up to serve the same dataset. (In fact, a robust implementation will likely include a server status in the document, e.g. { status: 'starting' }.)

Why Not Use a File System?

As we’ve seen, the data in Zookeeper can roughly be thought of as a filesystem with small files and clean concurrency guarantees. Text data is added/removed/updated at a particular path (e.g. /services/X), where it can be seen or observed by everyone connected to that Zookeeper instance. Zookeeper itself also runs on a cluster of machines (they recommend at least three) so that load among many competing clients can be shared, and so that the likelihood of complete data loss is very low.

Why not use the actual file system? The simple answer is that most file system implementations (e.g NFS) don’t give clean consistency guarantees. If I launch a new service and put a file in place declaring I’m running (/services/hammer-manager) you might read it at just the wrong time and see an incomplete result (/host:123 instead of, say, /host:1234). Zookeeper defines a strict guarantee of what will appear when, and how operations will be serialized. You’ll want to read the manual carefully.

Why not use a database? You could, but the features in Zookeeper have been designed to all work together in a way that would take a while to reproduce in a traditional database.

When Services Go Bad

What happens when the cluster hosting /service/X crashes? How does the node describing its services get removed? How do we keep clients from trying to connect? There are several approaches, but a nice one involves a key Zookeeper feature: ephemeral nodes. When a Zookeeper client (in this case our data server) creates a node, it can create it as ephemeral. So long as that client is alive (as determined by a heartbeat built into the Zookeeper protocol), the node is kept alive. If the Zookeeper server ever loses the heartbeat, it will delete the relevant node. Now without any extra work on your part, the service registry is kept up to date, even in the face of recalcitrant servers.

A server crashes, deleting its ephemeral node
A server crashes, deleting its ephemeral node

Because the server may crash and restart faster than the heartbeat, leaving an ephemeral node in place for a few seconds, and because there may be a race condition in starting two servers for the same dataset, you’ll want to be a little smarter than just ‘create ephemeral node on startup’, but you get the idea.

In the next post, we’ll discover that other clients can be told about this failure immediately, by watching Zookeeper documents, which is where we discover why these documents are stored in a hierarchy to begin with.

Contact Us

We are ready to accelerate your business forward. Get in touch.

Tell us what you need and one of our experts will get back to you.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.