Blog, Trixi

How to set up high available services – part 3

In this part of the High-Availability series I’ll show you how to install and configure HA software on your machines.

Preconditions

We assume that we have two nodes with Ubuntu 12.04.1 server preinstalled. Also there are Tomcat 7, Apache 2, Postgresql 9.1 installed on the nodes.

Network settings

First of all we have to set network on both nodes. As we are configuring network on servers there should be static ip configured. Network configuration is in /etc/network/interfaces file. Here is example of static ip configuration on one of the nodes:

iface eth0 inet static
address 10.0.0.123
gateway 10.0.0.130
netmask 255.255.255.0
broadcast +
dns-nameservers 10.0.0.130

In /etc/hosts file we should create records for other node so we can use its host name. Let’s assume that first node has node1 host name and second one has node2.

Try to ping other node to verify network settings.

Corosync and Pacemaker

Now we should install corosync and pacemaker on both nodes:

sudo apt-get install corosync pacemaker

We want to have corosync communication encrypted. Therefore we have to generate key to be used by encryption algorithm.

sudo corosync-keygen

This will create /etc/corosync/authkey file with key. This file should be distributed among all nodes.

Warning: Key generation doesn’t work via ssh for some reason. You should use terminal.

To instruct Corosync to load the quorum and messaging interfaces needed by pacemaker, create /etc/corosync/service.d/pcmk with the following fragment.

service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}

Now we can start corosync:

sudo /etc/init.d/corosync start
Info: For some reason corosync start script checks whether variable $START doesn’t contain “no”. But it does (I don’t know why). So I commented the condition out. Can somebody explain me this?

After corosync is running we can start pacemaker:

sudo /etc/init.d/pacemaker start

To verify whether corosync is running properly run following commands:

Command:

sudo corosync-cfgtool -s

Output:

Printing ring status.
Local node ID 2046820362
RING ID 0
 id = 10.0.0.122
 status = ring 0 active with no faults

Command:

sudo corosync-objctl | grep members

Output:

runtime.totem.pg.mrp.srp.members.2046820362.ip=r(0) ip(10.0.0.122)
runtime.totem.pg.mrp.srp.members.2046820362.join_count=1
runtime.totem.pg.mrp.srp.members.2046820362.status=joined
runtime.totem.pg.mrp.srp.members.2063597578.ip=r(0) ip(10.0.0.123)
runtime.totem.pg.mrp.srp.members.2063597578.join_count=2
runtime.totem.pg.mrp.srp.members.2063597578.status=joined

To verify whether pacemaker is running properly run following command:

sudo crm_mon -1

Output:

============
Last updated: Mon Aug 27 20:31:14 2012
Last change: Mon Aug 27 20:18:22 2012 via crmd on node1
Stack: openais
Current DC: node1 - partition with quorum
Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
2 Nodes configured, 2 expected votes
0 Resources configured.
============

Online: [ node1 node2 ]

Now we should disable STONITH. It is not recommended, but STONITH configuration is not in the scope of this post.

sudo crm configure property stonith-enabled=false

Now we can verify whether pacemaker configuration is ok.

sudo crm_verify -L

As we have only 2 nodes we should disable quorum. If qourum is enabled there should be more than a half of nodes active. This means more than one in our case which doesn’t make sense.

sudo crm configure property no-quorum-policy=ignore

Finally we should enable pacemaker start after system boot.

sudo update-rc.d pacemaker defaults 95 00

Resource agents

We are in state when corosync and pacemaker are running and are ready to handle resources on both nodes. All we need to do is to configure resource agents. This configuration should be done only on one of the nodes. Pacemaker will distribute configuration to other nodes.

IP Address

There is ip address agent which handles ip address hand over between nodes. We can set ip address which will be bound to active node.

sudo crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \
params ip=10.0.0.120 cidr_netmask=32 \
op monitor interval=30s

From now our cluster has ip address 10.0.0.120.

Tomcat

We want to have tomcat handled by pacemaker. We have to disable after boot tomcat start because it will be pacemaker which runs tomcat after boot.

sudo update-rc.d -f tomcat7 remove

And pacemaker will take care of it.

sudo crm configure primitive Tomcat7 ocf:heartbeat:tomcat \
params java_home=/usr/lib/jvm/default-java catalina_home=/usr/share/tomcat7 \
catalina_base=/var/lib/tomcat7 \
op monitor interval="1min" \
op start timeout="160s" \
op stop timeout="160s"

Apache

Same for apache:

sudo update-rc.d -f apache2 remove

sudo crm configure primitive webSite ocf:heartbeat:apache \
 params configfile="/etc/apache2/apache2.conf" \
 op monitor interval="30s" \
 op start interval="0" timeout="60s" \
 op stop interval="0" timeout="60s" \

PostgreSQL

And same for db.

sudo update-rc.d -f postgresql remove

sudo crm configure primitive PostgreSQL ocf:heartbeat:pgsql \
params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" \
pgdata="/var/lib/postgresql/9.1/main/" \
config="/etc/postgresql/9.1/main/postgresql.conf" \
op monitor interval="30s"

Colocation

Pacemaker now starts all our services but it tries to spread load among all nodes. To tell pacemaker which resources should run on the same node we have to create collocation constraints.

sudo crm configure colocation tomcat-with-postgres inf: Tomcat7 PostgreSQL
sudo crm configure colocation website-with-tomcat inf: webSite Tomcat7
sudo crm configure colocation ip-with-website inf: ClusterIP webSite

Order

Pacemaker will start all resources on the same node but in arbitrary order. To tell pacemaker order in which resources should be started we have to create order constraints.

sudo crm configure order tomcat-after-postgres mandatory: PostgreSQL Tomcat7
sudo crm configure order apache-after-tomcat mandatory: Tomcat7 webSite

Conclusion

Ok, now we have all resources set up and handled by pacemaker. Only thing we need to figure out is db synchronization. This is topic of the next post. See you soon.

Posted in programming


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>