How to set up high available services – part 2
In previous post I’ve tried to explain architectural aspects of building our high available service. Today we will go through software part of the build process.
After some googling I’ve found out, that there is a project which provides high availability open source software for linux. You can find software pages here. These are wiki pages and are under development in these days. Pages contain basic information about project and its parts, but it is not enough for somebody who wants to install and use HA software. A lot more information are available on clusterlab pages. These pages are for Pacemaker project. Let’s review basic parts of the HA software:
Pacemaker is cluster resource manager. Was part of the Linux-HA project. Since 2007 it is standalone project. Pacemaker basically monitors processes which are needed to provide our services. When any of the processes fails, Pacemaker will find it and restart or run the process on another machine in cluster.
Heartbeat is deamon that provides cluster infrastructure. It provides messaging layer for clusters so peacemaker can check if and where certain processes run. When no message with process identification is received in preconfigured time it is considered to be down.
Reason why Pacemaker became standalone project was that it can support more cluster stacks (Heartbeat, Corosync…). As I haven’t found enough information about Heartbeat and it seems to be dead project, I’ve decided to use Corosync. Corosync is newer than Heartbeat.
A resource agent is a standardized interface for a cluster resource. In translates a standard set of operations into steps specific to the resource or application, and interprets their results as success or failure.
Every resource agent may support following operations:
- resource start
- resource stop
- resource monitoring and returning of its status (running or not running)
- validate the resource’s configuration
- return information about the resource agent itself
On Linux-HA project pages you can find list of available resource agents which can be used. Essentially every known application server or database has available resource agent in this list. But you can create your own resource agent by implementing interface. Basically everything what can be scripted can be transformed into form of resource agent.
Cluster Glue represents set of libraries, tools and utilities for high availability clusters. It connects Pacemaker, cluster stack (Heartbeat or Corosync) and configured resource agents. Cluster Glue has following components:
Local Resource Manager
LRM is the interface between Pacemaker and the resource agents. It just processes commands from Pacemaker and passes them to resource agents. After running command it reports success or failure of the command. LRM may:
- start a resource
- stop a resource
- monitor a resource
- report a resource’s status
- list all resource instances it currently controls, and their status
In case a node is considered to be dead, it can be caused by network problem or something like this. We want to be sure that it is really down. For such a cases STONITH mechanism was brought here.
STONITH (“Shoot The Other Node In The Head”) forcefully removes node from the cluster on hardware level.
An advanced error reporting utility.
Cluster Plumbing Library
A low-level library for intra-cluster communications.
This was quick recap of the software which we will use for building high availability system. In next part we go through installation and configuration process.
Posted in programming