How to set up high available services – part 3
In this part of the High-Availability series I’ll show you how to install and configure HA software on your machines.
Preconditions
We assume that we have two nodes with Ubuntu 12.04.1 server preinstalled. Also there are Tomcat 7, Apache 2, Postgresql 9.1 installed on the nodes.
Network settings
First of all we have to set network on both nodes. As we are configuring network on servers there should be static ip configured. Network configuration is in /etc/network/interfaces
file. Here is example of static ip configuration on one of the nodes:
iface eth0 inet static address 10.0.0.123 gateway 10.0.0.130 netmask 255.255.255.0 broadcast + dns-nameservers 10.0.0.130
In /etc/hosts
file we should create records for other node so we can use its host name. Let’s assume that first node has node1 host name and second one has node2.
Try to ping other node to verify network settings.
Corosync and Pacemaker
Now we should install corosync and pacemaker on both nodes:
sudo apt-get install corosync pacemaker
We want to have corosync communication encrypted. Therefore we have to generate key to be used by encryption algorithm.
sudo corosync-keygen
This will create /etc/corosync/authkey
file with key. This file should be distributed among all nodes.
To instruct Corosync to load the quorum and messaging interfaces needed by pacemaker, create /etc/corosync/service.d/pcmk
with the following fragment.
service { # Load the Pacemaker Cluster Resource Manager name: pacemaker ver: 1 }
Now we can start corosync:
sudo /etc/init.d/corosync startInfo: For some reason corosync start script checks whether variable $START doesn’t contain “no”. But it does (I don’t know why). So I commented the condition out. Can somebody explain me this?
After corosync is running we can start pacemaker:
sudo /etc/init.d/pacemaker start
To verify whether corosync is running properly run following commands:
Command:
sudo corosync-cfgtool -s
Output:
Printing ring status. Local node ID 2046820362 RING ID 0 id = 10.0.0.122 status = ring 0 active with no faults
Command:
sudo corosync-objctl | grep members
Output:
runtime.totem.pg.mrp.srp.members.2046820362.ip=r(0) ip(10.0.0.122) runtime.totem.pg.mrp.srp.members.2046820362.join_count=1 runtime.totem.pg.mrp.srp.members.2046820362.status=joined runtime.totem.pg.mrp.srp.members.2063597578.ip=r(0) ip(10.0.0.123) runtime.totem.pg.mrp.srp.members.2063597578.join_count=2 runtime.totem.pg.mrp.srp.members.2063597578.status=joined
To verify whether pacemaker is running properly run following command:
sudo crm_mon -1
Output:
============ Last updated: Mon Aug 27 20:31:14 2012 Last change: Mon Aug 27 20:18:22 2012 via crmd on node1 Stack: openais Current DC: node1 - partition with quorum Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c 2 Nodes configured, 2 expected votes 0 Resources configured. ============ Online: [ node1 node2 ]
Now we should disable STONITH. It is not recommended, but STONITH configuration is not in the scope of this post.
sudo crm configure property stonith-enabled=false
Now we can verify whether pacemaker configuration is ok.
sudo crm_verify -L
As we have only 2 nodes we should disable quorum. If qourum is enabled there should be more than a half of nodes active. This means more than one in our case which doesn’t make sense.
sudo crm configure property no-quorum-policy=ignore
Finally we should enable pacemaker start after system boot.
sudo update-rc.d pacemaker defaults 95 00
Resource agents
We are in state when corosync and pacemaker are running and are ready to handle resources on both nodes. All we need to do is to configure resource agents. This configuration should be done only on one of the nodes. Pacemaker will distribute configuration to other nodes.
IP Address
There is ip address agent which handles ip address hand over between nodes. We can set ip address which will be bound to active node.
sudo crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 \ params ip=10.0.0.120 cidr_netmask=32 \ op monitor interval=30s
From now our cluster has ip address 10.0.0.120.
Tomcat
We want to have tomcat handled by pacemaker. We have to disable after boot tomcat start because it will be pacemaker which runs tomcat after boot.
sudo update-rc.d -f tomcat7 remove
And pacemaker will take care of it.
sudo crm configure primitive Tomcat7 ocf:heartbeat:tomcat \ params java_home=/usr/lib/jvm/default-java catalina_home=/usr/share/tomcat7 \ catalina_base=/var/lib/tomcat7 \ op monitor interval="1min" \ op start timeout="160s" \ op stop timeout="160s"
Apache
Same for apache:
sudo update-rc.d -f apache2 remove sudo crm configure primitive webSite ocf:heartbeat:apache \ params configfile="/etc/apache2/apache2.conf" \ op monitor interval="30s" \ op start interval="0" timeout="60s" \ op stop interval="0" timeout="60s" \
PostgreSQL
And same for db.
sudo update-rc.d -f postgresql remove sudo crm configure primitive PostgreSQL ocf:heartbeat:pgsql \ params pgctl="/usr/lib/postgresql/9.1/bin/pg_ctl" \ pgdata="/var/lib/postgresql/9.1/main/" \ config="/etc/postgresql/9.1/main/postgresql.conf" \ op monitor interval="30s"
Colocation
Pacemaker now starts all our services but it tries to spread load among all nodes. To tell pacemaker which resources should run on the same node we have to create collocation constraints.
sudo crm configure colocation tomcat-with-postgres inf: Tomcat7 PostgreSQL sudo crm configure colocation website-with-tomcat inf: webSite Tomcat7 sudo crm configure colocation ip-with-website inf: ClusterIP webSite
Order
Pacemaker will start all resources on the same node but in arbitrary order. To tell pacemaker order in which resources should be started we have to create order constraints.
sudo crm configure order tomcat-after-postgres mandatory: PostgreSQL Tomcat7 sudo crm configure order apache-after-tomcat mandatory: Tomcat7 webSite
Conclusion
Ok, now we have all resources set up and handled by pacemaker. Only thing we need to figure out is db synchronization. This is topic of the next post. See you soon.
Posted in programming
Leave a Reply