V4 High Availability
Contents
Background
SARK HA V1 and V2 used Heartbeat and our own in-house written handlers for file replication and failover. The cluster was very reliable but limited in its ability to quickly fail-back after a protracted failover event, usually involving some level of manual intervention. SARK V4 high availability uses an openAIS stack with DRBD to create a true cluster. Whereas HA1/HA2 ran as a fixed primary/secondary pair, V4 is not particularly concerned which cluster element is currently active, except in one corner case; that of switching PRI circuits between cluster nodes during fail-over and fail-back (See the section on rhino support below). The three principle software components used by V4 HA are Pacemaker, Corosync and DRBD. In broad terms, Pacemaker is the cluster manager, Corosync is the cluster communication component (it replaces Heartbeat) and DRBD (Distributed Replicated Block Device) is the data manager. SARK, and indeed Asterisk, know little about the cluster although both components are aware that it is running.
Platform
V4 HA runs on Debian Wheezy. It cannot currently run on SME Server. This is not due to any architectural limitation of V4 HA but rather the extreme difficulty of installing the openAIS stack on SME Server 8.x. It is unlikely that this will will change in the foreseeable future.
HA V3 vs HA V4
- No clean-up required after a V4 fail-over/fail-back
- V4 nodes run full firewall (V3 nodes require an upstream firewall)
- V4 nodes run Fail2ban (V3 runs ossec)
- V4 fail-over is slightly slower than V3 (around 30 seconds vs 12 seconds in V3)
- V4 Node removal/replacement is much easier – nodes can be pre-prepared and sync is fully automated
- V4 clusters can detect and resolve Asterisk hangs and freezes
- V4 clusters need to be pre-planned during initial Linux install
- V3-HA upgrade to V4-HA requires a re-install
- V4 upgrade to V4-HA requires either a re-install or repartitioning (or an extra drive)
- V4 clusters can run at dynamic Eth0 IP addresses
- Rhino card support is much simplified in V4
- V4 Nodes require a minimum of 2 NICs (at least for production)
- V4 dedicated eth1 link is much faster and more robust than V3 serial link
- V4-HA requires a steep learning curve to fully understand
- V4 HA installation has been simplified with a helper utility
What does the V4 cluster manage?
- Apache
- Asterisk and ALL of its data
- MySQL and ALL of its data
- DRBD
None of these components will be available on the passive node (DRBD is running but you can't access its data partition, at least not without a fight). Even if you were to manually start any of these components (don't!), they would not see the live data.
A word of caution
Pacemaker, Corosync and DRBD are extraordinarily powerful and complex pieces of software. You need to understand them and, at least in outline, be familiar with how they work together with one another. The ASHA Utility, makes V4 installation very simple and straightforward and hides a lot of the complexity but you do need to be aware of how to recover from some of the possible failures that can occur when running a cluster. Getting it wrong, or changing parameters without rehearsal can result in data loss so you must have a clear data backup and recovery strategy for your cluster before you make changes.
Installation
Prerequisites
- A minimal Debian Wheezy install (When tasksel runs you should only select openssh and nothing else)
- An empty partition on each cluster node.
- The partition MUST be exactly the same size on each node
- It should be marked as "Do not Use" when you define it.
- The partition needs to large enough to hold all of your MySQL and Asterisk data (including room for call recordings if you plan to make them).
- By convention we define the first logical partition (/dev/sda5) as the empty partition
- A minimum of 2 NICs on each node
How big should the DRBD partition be?
DRBD partition size is a trade off between making it big enough to handle any eventuality and the time it takes to synchronize when you bring a new or repaired node into the cluster. As a rough guide, on a single dedicated Gigabit link, the cluster will sync at around 30MBs. On commercial SARK systems we use a 64Gb SSD drive and we allocate 50Gb to the DRBD partition, 8Gb to root and 4Gb swap. This gives ample room for call recording with periodic offload and it syncs a new node in about half an hour.