Blog, Trixi

How to set up high available services – part 4

In previous part we’ve set pacemaker to work with apache, tomcat and postgresql. Only thing left to manage is database synchronization between nodes. We’ve set postgresql to run only on active node. Therefore pasive node doesn’t have data which are up to date in situation when it has to become active.
There are several possibilities how to synchronize data between two postgresql database servers. You can run db on both nodes and set up database mirroring. But with this approach you’re not sure that all db changes are transferred to passive node when active one become unavailable.
Better solution is DRBD. DRBD is software for disk mirroring. You can understood it as network based raid-1. Advantage of this approach is that all writes to disk are considered as done after it is commited on mirror disk too and it is database independent. You can use any db server you like. Even non database data could be synchronized using DRBD.

How to set up DRBD

We assume that you have empty disk with same size on both nodes. These disks will be synchronized at the end. In this post we assume that new nodes are /dev/sdb. All following steps are for both nodes.

You can install DRBD by command

sudo apt-get install drbd8-utils

Now we have to create new partition on synchronized disk. Issue

sudo fdisk /dev/sdb

Option n will create new partition. In next options we want primary partition and one partition. Other options could be left on default values. Option w will write all changes to disk itself. Our new partition is /dev/sdb1.
Next step is to configure DRBD. Configuration files are in /etc/drbd.d folder. File global_common.conf contains global configurations for all drbd devices (we are configuring only one now). Configuration files with .res extension contains configurations for device named as name of the file. We have to create drbd1.res file for our device.
Here is example of global_common.conf file:

global {
        usage-count no;

common {
        protocol C;

        handlers {
                pri-on-incon-degr "/usr/lib/drbd/; /usr/lib/drbd/; echo b > /proc/sysrq-trigger ; reboot -f";
                pri-lost-after-sb "/usr/lib/drbd/; /usr/lib/drbd/; echo b > /proc/sysrq-trigger ; reboot -f";
                local-io-error "/usr/lib/drbd/; /usr/lib/drbd/; echo o > /proc/sysrq-trigger ; halt -f";
                fence-peer "/usr/lib/drbd/";
                initial-split-brain "/usr/lib/drbd/ root";
                split-brain "/usr/lib/drbd/ root";
                out-of-sync "/usr/lib/drbd/ root";
                after-resync-target "/usr/lib/drbd/";

        startup {

        disk {
                # If a hard drive that is used as a backing block device
                # for DRBD on one of the nodes fails, DRBD may either pass
                # on the I/O error to the upper layer (usually the file
                # system) or it can mask I/O errors from upper layers.
                #  * detach. This is the recommended option. On
                #    the occurrence of a lower-level I/O error, the node
                #    drops its backing device, and continues in diskless
                #    mode -> DRBD transparently fetches the affected block
                #    from the peer node, over the network.
                #  * pass_on. This causes DRBD to report the I/O error to
                #    the upper layers. On the primary node, it is reported
                #    to the mounted file system. On the secondary node, it
                #    is ignored (because the secondary has no upper layer to
                #    report to).
                #  * call-local-io-error. Invokes the command defined as
                #    the local I/O error handler. This requires that
                #    a corresponding local-io-error command invocation is
                #    defined in the resource's handlers section. It is
                #    entirely left to the administrator's discretion to
                #    implement I/O error handling using the command (or
                #    script) invoked by local-io-error.
                on-io-error call-local-io-error;

                fencing resource-only;

        net {
                # sndbuf-size rcvbuf-size timeout connect-int ping-int ping-timeout max-buffers
                # max-epoch-size ko-count allow-two-primaries no-tcp-cork

                # You need to specify the HMAC algorithm to enable peer
                # authentication at all. You are strongly encouraged to use
                # peer authentication. The HMAC algorithm will be used for
                # the challenge response authentication of the peer. You may
                # specify any digest algorithm that is named in /proc/crypto.
                cram-hmac-alg sha1;

                # The shared secret used in peer authentication. May be up to
                # 64 characters. Note that peer authentication is disabled as
                # long as no cram-hmac-alg (see above) is specified.
                shared-secret "Secret";

                # Split brain has just been detected, but at this time
                # the resource is not in the Primary role on any host. For
                # this option, DRBD understands the following keywords:
                #  * disconnect. Do not recover automatically, simply invoke
                #    the split-brain handler script (if configured), drop
                #    the connection and continue in disconnected mode.
                #  * discard-younger-primary. Discard and roll back
                #    the modifications made on the host which assumed
                #    the Primary role last.
                #  * discard-least-changes. Discard and roll back
                #    the modifications on the host where fewer changes
                #    occurred.
                #  * discard-zero-changes. If there is any host on which no
                #    changes occurred at all, simply apply all modifications
                #    made on the other and continue.
                after-sb-0pri discard-least-changes;

                # Split brain has just been detected, and at this time
                # the resource is in the Primary role on one host. For this
                # option, DRBD understands the following keywords:
                #  * disconnect. As with after-sb-0pri, simply invoke
                #    the split-brain handler script (if configured), drop
                #    the connection and continue in disconnected mode.
                #  * consensus. Apply the same recovery policies as specified
                #    in after-sb-0pri. If a split brain victim can be selected
                #    after applying these policies, automatically resolve.
                #    Otherwise, behave exactly as if disconnect were specified.
                #  * call-pri-lost-after-sb. Apply the recovery policies as
                #    specified in after-sb-0pri. If a split brain victim can be
                #    selected after applying these policies, invoke
                #    the pri-lost-after-sb handler on the victim node. This
                #    handler must be configured in the handlers section and is
                #    expected to forcibly remove the node from the cluster.
                #  * discard-secondary. Whichever host is currently in
                #    the Secondary role, make that host the split brain victim.
                after-sb-1pri discard-secondary;

                # Split brain has just been detected, and at this time
                # the resource is in the Primary role on both hosts. This
                # option accepts the same keywords as after-sb-1pri except
                # discard-secondary and consensus.
                after-sb-2pri call-pri-lost-after-sb;

                # DRBD generates a message digest of every data block it
                # replicates to the peer, which the peer then uses to verify
                # the integrity of the replication packet. If the replicated
                # block can not be verified against the digest, the peer
                # requests retransmission. Thus, DRBD replication is protected
                # against several error sources.
                data-integrity-alg sha1;

        syncer {
                # after al-extents use-rle cpu-mask

                # The maximum bandwidth a resource uses for background
                # re-synchronization
                rate 33M;

                # Algorithm used for online verification - Sequentially
                # calculating a cryptographic digest of every block stored
                # on the lower-level storage device of a particular resource.
                # DRBD then transmits that digest to the peer node
                # (the verification target), where it is checked against
                # a digest of the local copy of the affected block.
                # If the digests do not match, the block is marked out-of-sync
                # and may later be synchronized.
                verify-alg sha1;

                # When using checksum-based synchronization, then rather than
                # performing a brute-force overwrite of blocks marked out of
                # sync, DRBD reads blocks before synchronizing them and
                # computes a hash of the contents currently found on disk. It
                # then compares this hash with one computed from the same
                # sector on the peer, and omits re-writing this block if
                # the hashes match. This can dramatically cut down
                # synchronization times in situation where a filesystem
                # re-writes a sector with identical contents while DRBD is in
                # disconnected mode.
                csums-alg sha1;

It is strongly encouraged to read drbd.conf manual page to understand all options in configuration file.
In drbd1.res we can configure drbd1 device itself.

resource drbd1 {
  device        /dev/drbd1;
  disk          /dev/sdb1;
  meta-disk     internal;
  on node1 {
  on node2 {

Here are ip adresses of both nodes configured and ports where drbd resides. Note that node1 and node2 are hostnames of nodes not keywords. This configuration will create virtual disk named drbd1. Everything what will be written to it will be transferred to other node too.
After we have configuration files on both of our nodes (check that all configuration files contains same settings) we can start drbd1 device (on one of the nodes).

sudo drbdadm create-md drbd1

Now there is conflict. Data on one node are different than data on second one. We have to choose which data are correct. Now it is all up to us. Run following command on one of the nodes:

sudo drbdadm -- --overwrite-data-of-peer primary drbd1

Next step is to format drbd1 device.

sudo mkfs -t ext3 /dev/drbd1

How to set up pacemaker

First of all disable auto start of the drbd (on both nodes). Pacemaker will take care of it.

sudo update-rc.d -f drbd remove

Now create crm primitive (these steps could be done on one node only):

sudo crm configure primitive disk_drbd1 ocf:linbit:drbd \
params drbd_resource=drbd1 \
op monitor interval="15s"

Only one of the nodes can have master disk (disk where changes are made). Second node only listens for changes.

sudo crm configure ms ms_disk_drbd1 disk_drbd1 \
meta master-max=1 \

After drbd1 become master we want to mount it to filesystem:

sudo crm configure primitive mount_drbd1 ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/media/drbd1" fstype="ext3"

It will be mounted to /media/drbd1 directory.
Let’s create group to better manage pacemaker settings:

sudo crm configure group postgre_sync mount_drbd1 PostgreSQL

Now we have to set that postgresql will run only on master node:

sudo crm configure colocation postgres-with-drbd1 inf: PostgreSQL ms_disk_drbd1:Master
sudo crm configure order postgres-after-drbd1 inf: ms_disk_drbd1:promote postgre_sync:start

Now everything is set and all you need is to move postgresql data to drbd1 disk and create symlink from /var/lib/postgresql to new disk with data.

Posted in programming

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>