ZooKeeper is a highly reliable distributed coordination service for distributed applications to coordinate with each other through a shared hierarchical name space, which is organized similarly to a standard file system path. The name space consists of data registers - called
ZooKeeper parlance - and these are similar to files and directories which can provide strictly ordered access to the
znodes. Unlike a typical file system, which is designed for storage,
ZooKeeper data is kept in-memory, which means
ZooKeeper can achieve high throughput and low latency numbers. It is especially fast in “read-dominant” workloads.
ZooKeeper applications run on thousands of machines, and it performs best where reads are more common than writes, at ratios of around 10:1.
ZooKeeper is ordered.
ZooKeeper stamps each update with a number that reflects the order of all
ZooKeeper transactions. Subsequent operations can use the order to implement higher-level abstractions, such as synchronization primitives.
The performance aspects of
ZooKeeper allow it to be used in large distributed systems. The reliability aspects prevent it from becoming the single point of failure in big systems. Its strict ordering allows sophisticated synchronization primitives to be implemented at the client.
From this part on, I will write a series of tutorials on ZooKeeper. Some concepts and terminologies are introduced here first.
What is ZooKeeper?
As the Official Site said,
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications. Each time they are implemented there is a lot of work that goes into fixing the bugs and race conditions that are inevitable. Because of the difficulty of implementing these kinds of services, applications initially usually skimp on them ,which make them brittle in the presence of change and difficult to manage. Even when done correctly, different implementations of these services lead to management complexity when the applications are deployed.
ZooKeeper runs in Java and has bindings for both Java and C.
ZooKeeper is very fast and very simple, which is designed to be a basis for the construction of more complicated services, such as synchronization, it provides a set of guarantees. These are:
- Sequential Consistency - Updates from a client will be applied in the order that they were sent.
- Atomicity - Updates either succeed or fail. No partial results.
- Single System Image - A client will see the same view of the service regardless of the server that it connects to.
- Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
- Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.
Learn more about
ZooKeeper on the ZooKeeper Official Site.
Some Concepts and Terminology
Data model and the hierarchical namespace
Just as mentioned in the Overview, the name space provided by
ZooKeeper is much like that of a standard file system. A name is a sequence of path elements separated by a slash (“/“). Every node in ZooKeeper’s name space is identified by a path, which always needs to start with the root znode ( “/“ ).
Unlike standard file systems, each node in a
ZooKeeper namespace can have data associated with it as well as children. You can create some sub-znodes/children znodes in the
znode. It is like having a file-system that allows a file to also be a directory. (
ZooKeeper was designed to store coordination data: status information, configuration, location information, etc., so the data stored at each node is usually small, in the byte to kilobyte range.) We use the term
znode to make it clear that we are talking about
ZooKeeper data nodes.
znodemust has a parent whose path is a prefix of the
znodewith one less element; the exception to this rule is root znode (“/“) which has no parent. Also, exactly like standard file systems, a znode cannot be deleted if it has any children.
Znodes maintain a stat structure that includes version numbers for data changes, acl changes. The stat structure also has timestamps. The version number, together with the timestamp allow
ZooKeeper to validate the cache and to coordinate updates. Each time a
znode‘s data changes, the version number increases. For instance, whenever a client retrieves data, it also receives the version of the data. And when a client performs an update or a delete, it must supply the version of the data of the
znode it is changing. If the version it supplies doesn’t match the actual version of the data, the update will fail.
The command syntax for creating a
znode is as follows:
create -[options] /[znode-name] [znode-data]
Example 1: Create a new znode named “znode_test” with data “znode_test_data”
The path consists of the root znode (“/“) and the name of the
znode you want to create. Here you can write
/znode_test for its path.
[zk: localhost:2181(CONNECTED) 0] ls /
The command about operations on the
znodes, including creating, getting, deleting and etc, will be introduced in the following tutorials.
Example 2: Create a recursive znode named “znode_rtest3” with data “znode_rtest_data”
Only one important thing you have to keep it in mind, when you create a new recursive
znode, the znodes/paths along have to be already created. Otherwise, exceptions will be thrown.
[zk: localhost:2181(CONNECTED) 3] ls /
Different Types of Znodes
ZooKeeper there are 3 types of znodes: persistent, ephemeral, and sequential.
Persistent Znodes (Default)
These are the default znodes in
ZooKeeper. They will stay in the
ZooKeeperserver permanently, as long as any other clients (including the creator) leave it alone.
create /znode mydata
Ephemeral znodes (also referred as session znodes) are temporary znodes. Unlike the persistent znodes, they are destroyed as soon as the creator client logs out of the
ZooKeeperserver. For example, let’s say client1 created eznode1. Once client1 logs out of the
create –e /eznode mydata
Sequential znode is given a 10-digit number in a numerical order at the end of its name. Let’s say client1 created a sznode1. In the
ZooKeeperserver, the sznode1 will be named like this:
If client1 creates another sequential znode, it would bear the next number in a sequence. So the next sequential znode will be called [znode-name]0000000002.
create –s /sznode mydata
ACL (Access Control List) is basically an authentication mechanism implemented in
ZooKeeper. It makes
znodes accessible to users, depending on how it is set. This part will be introduced in the following tutorials.
Ensemble and Quorum
ZooKeeper service can be replicated over a sets of hosts called an
ensemble. As long as a majority of the
ensemble are up, the service will be available. For the
ZooKeeper service to be active, there must be a
majority of non-failing machines that can communicate with each other. Failure in this context means a machine crash, or some error in the network that partitions a server off from the majority. To create a deployment that can tolerate the failure of F machines, you should count on deploying 2xF+1 machines. Thus, a deployment that consists of three machines can handle one failure, and a deployment of five machines can handle two failures. Note that a deployment of six machines can only handle two failures since three machines is not a majority. For this reason,
ZooKeeper deployments are usually made up of an odd number of machines. Three
ZooKeeper servers is the minimum recommended size for an
ensemble, and we also recommend that they run on separate machines.
A replicated group of servers in the same application is called a
quorum. All servers in the
quorum have copies of the same configuration file.
QuorumPeers will form a
ZooKeeper ensemble. A
quorum is represented by a strict majority of nodes. You can have one node in your
ensemble, but it won’t be a highly available or reliable system. If you have two nodes in your
ensemble, you would need both to be up to keep the service running because one out of two nodes is not a strict majority. If you have three nodes in the
ensemble, one can go down, and you would still have a functioning service (two out of three is a strict majority).
quorum of nodes are not available in an
ZooKeeper service is nonfunctional.