The elx-zookeeper-cluster package is available in the Ambience installation files. This is an alternate package to use, when you need to run a cluster of ZooKeeper nodes instead of standalone mode for high availability systems. It is recommended to run at least three ZooKeeper nodes on three separate machines to meet the high availability requirements. If there are sufficient resources, it is recommended to run five ZooKeeper nodes, which allows one node to be taken down for planned maintenance without affecting the availability.
You can install either elx-zookeeper or elx-zookeeper-cluster. If you install both packages, neither will work, because the two packages contain many files in common. Both packages install a service called elx-zookeeper. Therefore, there is a single service name to use when starting or stopping the service, no matter if the system is clustered or standalone.
The elx-zookeeper-cluster package has three key differences from elx-zookeeper:
The service does not start automatically upon install, which provides more time for manual configuration.
The ZooKeeper configuration file contains an example of the cluster configuration, which needs to be edited.
The ZooKeeper class entry point in /etc/init.d/elx-zookeeper is QuorumPeerMain, so that the node knows to look for other members of the cluster and elect a leader.
In the following example, three machines called VM2, VM3, and VM4 are configured to automate the creation of a ZooKeeper cluster. These machines have Java installed and SSH certificates, so that files could be transferred without repeated entry of passwords.
The following script called cloud-zk.sh can automate the process:
#!/bin/bash set -o nounset sudo echo "cloud-zk" "$1" mkdir tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ dist/elx-arch_2.5-1_all.deb tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ dist/elx-zookeeper-cluster_2.5-1_all.deb tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ cloud-zoo.cfg tmp sudo dpkg --install tmp/elx-arch_2.5-1_all.deb sudo dpkg --install tmp/elx-zookeeper-cluster_2.5-1_all.deb echo "$1" | sudo tee /var/elixir/zookeeper/myid sudo cp tmp/cloud-zoo.cfg /etc/elixir/zookeeper/zoo.cfg sudo service elx-zookeeper start sleep 10 echo ruok | nc 127.0.0.1 2181 echo stat | nc 127.0.0.1 2181
The following ZooKeeper configuration file called cloud-zoo.cfg will be installed on each of the nodes:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. dataDir=/var/elixir/zookeeper # the port at which the clients will connect clientPort=2181 #original default is 10, changed to 50 to allow more services to run on a single machine (change to 0 for no limit) maxClientCnxns=50 server.1=VM2:2888:3888 server.2=VM3:2888:3888 server.3=VM4:2888:3888
The only difference between cloud-zoo.cfg and zoo.cfg is the the three lines at the bottom, which define the names of the three servers in the cluster.
The following script is first copied from the main machine “knockshinnie” in this example to each node in the cluster:
scp cloud-zk.sh VM2:/home/jon scp cloud-zk.sh VM3:/home/jon scp cloud-zk.sh VM4:/home/jon
Then run the script on each machine, with a different identifier 1, 2, 3. For example, run the following script with the identifier 1 on VM2:
./cloud-zk.sh 1
Be sure to use the number as defined in the cloud-zoo.cfg for each server.
A breakdown of the script follows. This tells bash to fail if a variable is undefined – for example, if you forget to specify an identifier.
set -o nounset
The following script echos the configuration using sudo, which means that if any password is needed, it will be prompted right at the start (without this, the script might pause later waiting for a password).
sudo echo "cloud-zk" "$1"
The following script makes tmp directory and copy the three essential files across. Be sure to alter the paths for your machine.
mkdir tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ dist/elx-arch_2.5-1_all.deb tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ dist/elx-zookeeper-cluster_2.5-1_all.deb tmp scp knockshinnie:/home/jon/Code/Develop/Ambience/System/debian/ cloud-zoo.cfg tmp
The following script installs the two debian files. Make sure elx-zookeeper has not been installed to avoid conflict.
sudo dpkg --install tmp/elx-arch_2.5-1_all.deb sudo dpkg --install tmp/elx-zookeeper-cluster_2.5-1_all.deb
This writes the identifier, e.g. 1, 2, 3 into the myid file for ZooKeeper to read:
echo "$1" | sudo tee /var/elixir/zookeeper/myid
The following script copies cloud-zoo.cfg over the default zoo.cfg:
sudo cp tmp/cloud-zoo.cfg /etc/elixir/zookeeper/zoo.cfg
The following script starts the service. The service name is elx-zookeeper, same as the name in the standalone Ambience.
sudo service elx-zookeeper start
It takes a while for the ZooKeeper to start. You need to check whether it is running OK.
sleep 10 echo ruok | nc 127.0.0.1 2181 echo stat | nc 127.0.0.1 2181
When the script is run on the first machine, it will likely answer, “This ZooKeeper instance is not currently serving requests”. It is because the first machine is waiting for another member of the cluster to start.
When you start the second machine, it will report that it is either a leader or a follower (depending on which one wins the vote).
When you start the third machine, it will likely report that it is a follower.
You can re-run these status checks at any time to verify the cluster is running, even from a different machine if the ports are not blocked by firewalls:
telnet VM2 2181 Trying 192.168.0.22... Connected to VM2. Escape character is '^]'. ruok imokConnection closed by foreign host.
The response to ruok is imok, as shown on the last line.
The stat command returns more information, as shown in the following.
telnet VM2 2181 Trying 192.168.0.22... Connected to VM2. Escape character is '^]'. stat Zookeeper version: 3.3.3-1073969, built on 02/23/2011 22:27 GMT Clients: /192.168.0.19:48891[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 4 Sent: 3 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 Connection closed by foreign host.
Edit the zookeeper.properties file to make it know about the cluster. There may be several copies of this file, under .elixirtech/ in your home directory, elixir home directory and root home directory. They all need to include the cluster information:
hosts=VM2:2181,VM3:2181,VM4:2181 ...rest unchanged
Proceed to add the additional services you need and install data into your ZooKeeper cluster.