Tutorial on ZooKeeper – Part 2: Installation and Configuration

In last tutorial, some concepts and terminologies are introduced. To further investigate and use ZooKeeper, we move to the next step - install and configure ZooKeeper.

Installation

ZooKeeper is very simple to install.

Pre-requisites

See System Requirements in the Admin guide.

Download the Source Code and Install

To get a ZooKeeper distribution, download a recent stable release from one of the Apache Download Mirrors.

1
2
3
$ cd /opt
$ wget http://apache.arvixe.com/zookeeper/stable/zookeeper-3.4.6.tar.gz
$ tar -zxvf zookeeper-3.4.6.tar.gz

For Ubuntu, you can also install ZooKeeper using directly debian packages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
$ apt-cache search zookeeper
libzookeeper-java - Core Java libraries for zookeeper
libzookeeper-java-doc - API Documentation for zookeeper
libzookeeper-mt-dev - Development files for multi threaded zookeeper C bindings
libzookeeper-mt2 - Multi threaded C bindings for zookeeper
libzookeeper-st-dev - Development files for single threaded zookeeper C bindings
libzookeeper-st2 - Single threaded C bindings for zookeeper
libzookeeper2 - C bindings for zookeeper - transitional package
monsterz - arcade puzzle game
monsterz-data - graphics and audio data for monsterz
python-kazoo - higher level API to Apache Zookeeper for Python clients
python-kazoo-doc - API to Apache Zookeeper for Python clients. - API documentation
python-txzookeeper - Twisted-based Asynchronous Libraries for Apache Zookeeper.
python-zookeeper - Python bindings for zookeeper
zookeeper - High-performance coordination service for distributed applications
zookeeper-bin - Command line utilities for zookeeper
zookeeperd - Init control scripts for zookeeper
$ apt-get install -y zookeeper zookeeper-bin zookeeperd

Configuration

In ZooKeeper, there are two types of modes. One is the standalone mode, the other is the replicated mode.

Standalone Mode

Setting up a ZooKeeper server in standalone mode is straightforward. The server is contained in a single JAR file, so installation consists of creating a configuration.

1
2
3
4
5
$ cd /opt/zookeeper-3.4.6/conf
$ ls
configuration.xsl log4j.properties zoo_sample.cfg
# To start ZooKeeper you need a configuration file. copy it from ./zoo_sample.cfg
$ cp zoo_sample.cfg zoo.cfg

Then open the file zoo.cfg, you will see

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Change the value of dataDir to specify an existing (empty to start with) directory.

For standalone mode, only the below three fields are needed and meaningful:

  • tickTime
  • dataDir
  • clientPort

Now that you created the configuration file, you can start ZooKeeper:

1
2
3
4
5
6
7
8
9
10
$ pwd
/opt/zookeeper-3.4.6/conf
$ ../bin/zkServer.sh start
JMX enabled by default
Using config: /opt/zookeeper-3.4.6/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
$ ../bin/zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper-3.4.6/bin/../conf/zoo.cfg
Mode: standalone

Replicated Mode

Running ZooKeeper in standalone mode is convenient for evaluation, some development, and testing.

You can find the meanings of these and other configuration settings in the section Configuration Parameters. A word though about a few here:

Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with the series of lines of the form server.id=host:port:port. The parameters host and port are straightforward. You attribute the server id to each machine by creating a file named myid, one for each server, which resides in that server’s data directory, as specified by the configuration file parameter dataDir.

But in production, you should run ZooKeeper in replicated mode. A replicated group of servers in the same application is called a quorum, and in replicated mode, all servers in the quorum have copies of the same configuration file. The file is similar to the one used in standalone mode, but with a few differences. Here is an example:

1
2
3
4
5
6
7
8
tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2
server.1=zoo1:2888:3888
server.2=zoo2:2888:3888
server.3=zoo3:2888:3888

For the meanings of the new entries, initLimit and syncLimit, please refer to the comments in the file zoo.cfg of Standalone Mode.

The entries of the form server.X list the servers that make up the ZooKeeper service. When the server starts up, it knows which server it is by looking for the file myid in the data directory. This file, which I will show its usage in the next tutorial, is quite IMPORTANT and INDISPENSABLE. That file contains the server number, which is a cluster-unique ZooKeeper‘s instance id (1-255) in ASCII, and it should match X in server.X in the left hand side of this setting.

The list of servers that make up ZooKeeper servers that is used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has.

Finally, note the two port numbers after each server name: 2888 and 3888. Peers use the former port to connect to other peers. Such a connection is necessary so that peers can communicate, for example, to agree upon the order of updates. More specifically, a ZooKeeper server uses this port to connect followers to the leader. When a new leader arises, a follower opens a TCP connection to the leader using this port. Because the default leader election also uses TCP, we currently require another port for leader election. This is the second port in the server entry.

In the next tutorial, I will give an explicit example on how to setup this replicated mode/a cluster of ZooKeeper server ( also known as an ensemble ) starting from scratch.

Reference