V4 High Availability

From sailpbx
Revision as of 08:47, 16 February 2014 by Adminwiki (talk | contribs)
Jump to: navigation, search

back to SARK Main Page

Background

SARK HA V1 and V2 used Heartbeat and our own in-house written handlers for file replication and failover. The cluster was very reliable but limited in its ability to quickly fail-back after a protracted failover event, usually involving some level of manual intervention. SARK V4 high availability uses an openAIS stack with DRBD to create a true cluster. Whereas HA1/HA2 ran as a fixed primary/secondary pair, V4 is not particularly concerned which cluster element is currently active, except in one corner case; that of switching PRI circuits between cluster nodes during fail-over and fail-back (See the section on rhino support below). The three principle software components used by V4 HA are Pacemaker, Corosync and DRBD. In broad terms, Pacemaker is the cluster manager, Corosync is the cluster communication component (it replaces Heartbeat) and DRBD (Distributed Replicated Block Device) is the data manager. SARK, and indeed Asterisk, know little about the cluster although both components are aware that it is running.

Platform

V4 HA runs on Debian Wheezy. It cannot currently run on SME Server. This is not due to any architectural limitation of V4 HA but rather the difficulty of deploying the openAIS stack on SME Server 8.x. and it is unlikely that this will will change in the foreseeable future.

HA V3 vs HA V4

  • Little or no clean-up required after a V4 fail-over
  • V4 is full function after fail-over
  • V4 nodes run full firewall (V3 nodes do not)
  • V4 nodes run Fail2ban (V3 runs ossec)
  • Cluster removal/replacement is much easier in V4 – can be pre-prepared
  • V4 clusters have to be pre-planned from initial Linux install
  • V3-HA upgrade to V4-HA requires a re-install
  • V4 non-HA upgrade to V4-HA requires a re-install or an extra drive
  • V4 clusters can run at dynamic Eth0 IP addresses
  • Rhino card support is much simplified in V4
  • V4-HA requires a steep learning curve to fully understand
  • V4 Nodes require a minimum of 2 NICs (at least for production)
  • V4 dedicated eth1 link is much faster and more robust than V3 serial link