您的位置:首页 > 其它

Heartbeat

2015-07-23 22:08 1111 查看

Heartbeat

作者:Danbo 时间:2015-7-23

通过heartbeat可以将资源(VIP及程序服务等资源)从一台故障计算机快速转移到另一台运转正常的机器继续提供服务。

Heartbeat工作原理:
通过修改heartbeat的配置文件,指定哪一台heartbeat服务器作为主服务器,则另一台将自动成为备份服务器。然后在备份服务器上配置heartbeat守护程序来监听来自主服务器的心跳。如若备份服务器在指定时间内未监听来自主服务器的心跳,就会启动故障转移程序,并取得服务器上的相关资源服务的所有权,接替主服务器继续不间断的提供服务,以达到资源服务高可用的目的。以上是主备模式,还有一种是主主模式(互为主备),双方都发送报文,当指定时间内未收到对方发送的心跳报文,那么就认为对方失效,此时就会接管运行在对方主机上的资源或服务。

Heartbeat的心跳连接
两台heartbeat主机之间的通信一般常用的方法:
1.串行电缆(缺点:距离不能太远)
2.一根以太网电缆(网线)两网卡直连。(最常用)
3.以太网电缆,通过交换机等网络设备连接。(这样增加了交换机故障点,同时线路不是专用心跳,容易受到其他数据传输的影响)

Heartbeat裂脑
什么是裂脑?由于两台高可用服务器在规定的时间内,无法互相检测到对方能心跳而各自启动故障转移功能,取得了资源及服务的所有权,而此时的两台高可用服务器对都还活着并且正在正常运行,这样就会导致同一个IP或服务在两端同时启动而发送冲突的严重问题,最严重的是两天主机占用同一个IP地址,这种情况就被称为裂脑,英文:split brain。

导致裂脑发生的多种原因:
-高可用服务器对之间心跳链路故障,导致无法正常通信。
-高可用服务器对上开启了防火墙阻挡了心跳消息的传输。
-高可用服务器对上心跳网卡地址等信息配置不正确,导致传输失败。
-两端心跳方式不同,心跳广播冲突。

防止举措:
-加冗余线路,两台心跳线路
-检测到裂脑,强行关闭一个心跳节点。需要特殊设备支持比如:Stonith、Fence(可编程电源控制器)等。
-做好对裂脑的监控报警(邮件手机短信等)
-启动磁盘锁。正在服务的一方锁定共享磁盘。正在服务的一方只有发现心跳全部断开后才启动磁盘锁。
-增加仲裁机制:当心跳线完全断开时,两个节点各自ping一下参考网关,那一端不通则表明断点就在本端,能够ping通的一端接管服务,ping不通的一端自动重启。以此释放共享资源。

Heartbeat消息类型
-心跳消息
-集群转换消息
-重传请求

心跳消息:大约为150字节,可能为单播、广播、组播的方式
集群转换消息:ip-request和ip-request-resp
当主服务器恢复在线状态时,通过ip-request消息要求备机释放主服务器失败时备服务器取得的资源,然后备份服务器关闭资源及服务。备机释放资源及服务后,主服务器收到ip-request-resp消息通知主服务器接管资源及服务。
重传请求:rexmit-request控制重传心跳请求。

以上心跳控制消息使用的是UDP协议发送到/etc/ha.d/ha.cf文件指定的任意端口。

Heartbeat IP地址接管和故障转移
Heartbeat IP是通过IP地址接管和ARP广播进行地址转移的。
ARP广播(免费arp):在主服务器故障时,备用节点接管资源后,会立即更新所有客户端本地的ARP表。
注意这里所谓的客户端是指与heartbeat高可用服务器在同一局域网中的客户端,并不是最终的联网用户。客户机是相对于heartbeat高可用服务器来说的。

VIP是虚拟IP,其实就是绑定在物理网卡上的别名IP,如eth0:x,你可以在一块网刊上绑定多个别名,在实际的生产环境中,需要在DNS配置中把网站域名地址解析到这个VIP地址,有这个VIP对用户提供服务。

配置VIP的方法:
heartbeat:ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 up
keepalived:ip addr add 192.168.1.2/24 broadcast 192.168.1.255 dev eth1(辅助IP)
ifconfig 能看到heartbeat配置的VIP,ip add方式能看到以辅助IP方式增加的VIP

删除VIP的方法:
heartbeat:ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 down 或 ifconfig eth0:1 down
keepalived:ip addr del 192.168.1.2/24 broadcast 192.168.1.255 dev eth1

Heartbeat配置文件
ha.cf heartbeat参数的配置文件
authke beartbeat认证文件
haresourcce heartbeat资源配置文件如IP资源及脚本程序。

部署Heartbeat需求
拓扑图如下:



Heartbeat服务器主机资源规划:

给虚拟机配置IP和主机名
IP配置:

DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.23.255
HWADDR=00:0C:29:1B:C7:C9
IPADDR=192.168.23.128
IPV6INIT=yes
IPV6_AUTOCONF=yes
NETMASK=255.255.255.0
NETWORK=192.168.23.0
ONBOOT=yes


  

我们配置主机名:/etc/sysconfig/network设置HOSTNAME=data1和data2
然后执行hostname data1和data2生效

配置hosts文件:用于解析内网,因为在heartbeat的配置文件中使用了机器名。
echo "172.16.1.129 data1" >>/etc/hosts
echo "172.16.1.132 data1" >>/etc/hosts

此时我们发现启动后系统卡在Starting sendmail处,此时我们需要关闭sendmail服务的自动启动:
killall sendmail
chkconfig --list | grep sendmail
chkconfig --level 2345 sendmail off

添加主机路由,用于心跳通信,两台机器都要添加。
[root@data1 /]# route add -host 10.0.1.132 dev eth2
[root@data1 /]# echo 'route add -host 10.0.1.132 dev eth2' >>/etc/rc.d/rc.local
上一条命令是让路由保存在开机文件中,下次启动自动加入。

安装heartbeat软件

yum安装rpm包后本地不清除安装后的rpm包:
sed -i 's/keepcache=0/keepcache=1/g' /etc/yum.conf
这里我们采用yum方式安装:yum -y install heartbeat
#注意必须要执行两边,第一遍会提醒:
Failed:
heartbeat.x86_64 0:2.1.3-3.el5.centos
-可以自动解决以来关系,就是rpm安装的方式,方面简单,当不灵活,编译方式可以定制安装,灵活,但复杂。不过大公司会自己定制适合自己使用的rpm包,放在yum仓库里。
安装完毕后进入heartbeat的doc的默认目录:cd /usr/share/doc/heartbeat-2.1.3/
我们主要看:ll ha.cf authkeys haresources

cat /ha.cf
首先与日志相关的配置如下所示:

# File to write debug messages to

debugfile /var/log/ha-debug
#
#  File to write other messages to
logfile /var/log/ha-log
#
# Facility to use for syslog()/logger
logfacility local0

以下是一些计时器的基础参数一般不改动:
keepalive 2
deadtime 30
warntime 10      #发出警告时间
initdead 120     #Heartbeat守护进程首次启动后应该等待120s后再启动主服务器上的资源。

以下行表示使用组播方式,需要改动的有eth2,即改为心跳先的那个网卡:
#serial /dev/ttyS0 #采用串口就在这配置。
mcast [dev] [mcast group] [port] [ttl] [loop]   mcast group这个每台主机都不能一样。
改为mcast eth2 255.0.0.129 694 1 0

以下行表示当Master宕机后重新恢复回来,即是否抢占模式。
auto_failback on   #开启抢占模式
node    data1   #即两台存储Server的主机名。这就是我们为什么要修改hosts绑定IP与主机名,必须与route -n显示的一致
node    data2

crm     no  #表示clusters resource management。这个no。
ping    10.10.10.254    #通过ping参考网关的方式来检测HA服务武器是否正常工作,这个可以用于检测裂脑行为。


  

配置authkey文件

[root@data1 heartbeat-2.1.3]# cat authkeys
#
#       Authentication file.  Must be mode 600   #注意这个文件的权限不需为600
#
#
#       Must have exactly one auth directive at the front.
#       auth    send authentication using this method-id
#
#       Then, list the method and key that go with that method-id
#
#       Available methods: crc sha1, md5.  Crc doesn't need/want a key.(三种认证方式,CRC的不需要key)
#
#       You normally only have one authentication method-id listed in this file
#
#       Put more than one to make a smooth transition when changing auth
#       methods and/or keys.
#
#
#       sha1 is believed to be the "best", md5 next best.
#
#       crc adds no security, except from packet corruption.
#               Use only on physically secure networks.
#
auth 1  #配置哪个有效。
1 crc  (默认)
#2 sha1 HI!
#3 md5 Hello!


配置haresources

配置haresource文件
编辑配置heartbeat资源文件:
[root@data1 heartbeat-2.1.3]# cat haresources
#
#       This is a list of resources that move from machine to machine as
#       nodes go down and come up in the cluster.  Do not include
#       "administrative" or fixed IP addresses in this file.
#
# <VERY IMPORTANT NOTE>
#       The haresources files MUST BE IDENTICAL on all nodes of the cluster.
#
#       The node names listed in front of the resource group information
#       is the name of the preferred node to run the service.  It is
#       not necessarily the name of the current machine.  If you are running
#       auto_failback ON (or legacy), then these services will be started
#       up on the preferred nodes - any time they're up.
#
#       If you are running with auto_failback OFF, then the node information
#       will be used in the case of a simultaneous start-up, or when using
#       the hb_standby {foreign,local} command.
#
#       BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
#       If your files are different then almost certainly something
#       won't work right.
# </VERY IMPORTANT NOTE>
#
#
#       We refer to this file when we're coming up, and when a machine is being
#       taken over after going down.
#
#       You need to make this right for your installation, then install it in
#       /etc/ha.d
#
#       Each logical line in the file constitutes a "resource group".
#       A resource group is a list of resources which move together from
#       one node to another - in the order listed.  It is assumed that there
#       is no relationship between different resource groups.  These
#       resource in a resource group are started left-to-right, and stopped
#       right-to-left.  Long lists of resources can be continued from line
#       to line by ending the lines with backslashes ("\").
#
#       These resources in this file are either IP addresses, or the name
#       of scripts to run to "start" or "stop" the given resource.
#
#       The format is like this:
#
#node-name resource1 resource2 ... resourceN   #绑定服务器。
#IPaddr为heartbeat配置IP的默认脚本,其后的IP等都是脚本的参数。
data1 IPaddr::10.1.1.129/24/eth0     #这个为集群对外的VIP,初始启动在data1上,将VIP绑定在eth0上,为heartbeat提供对外服务的通信接口。相当于执行了:/etc/ha.d/resource.d/IPaddr 10.1.1.129/24/eth0 start
data2 IPaddr::10.1.1.129/24/eth0
#
#
#       If the resource name contains an :: in the middle of it, the
#       part after the :: is passed to the resource script as an argument.
#       Multiple arguments are separated by the :: delimeter
#
#       In the case of IP addresses, the resource script name IPaddr is
#       implied.
#
#       For example, the IP address 135.9.8.7 could also be represented
#       as IPaddr::135.9.8.7
#
#       THIS IS IMPORTANT!!     vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
#       The given IP address is directed to an interface which has a route
#       to the given address.  This means you have to have a net route
#       set up outside of the High-Availability structure.  We don't set it
#       up here -- we key off of it.
#
#       The broadcast address for the IP alias that is created to support
#       an IP address defaults to the highest address on the subnet.
#
#       The netmask for the IP alias that is created defaults to the same
#       netmask as the route that it selected in in the step above.
#
#       The base interface for the IPalias that is created defaults to the
#       same netmask as the route that it selected in in the step above.
#
#       If you want to specify that this IP address is to be brought up
#       on a subnet with a netmask of 255.255.255.0, you would specify
#       this as IPaddr::135.9.8.7/24 .
#
#       If you wished to tell it that the broadcast address for this subnet
#       was 135.9.8.210, then you would specify that this way:
#               IPaddr::135.9.8.7/24/135.9.8.210
#
#       If you wished to tell it that the interface to add the address to
#       is eth0, then you would need to specify it this way:
#               IPaddr::135.9.8.7/24/eth0
#
#       And this way to specify both the broadcast address and the
#       interface:
#               IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
#       The IP addresses you list in this file are called "service" addresses,
#       since they're they're the publicly advertised addresses that clients
#       use to get at highly available services.
#
#       For a hot/standby (non load-sharing) 2-node system with only
#       a single service address,
#       you will probably only put one system name and one IP address in here.
#       The name you give the address to is the name of the default "hot"
#       system.
#
#       Where the nodename is the name of the node which "normally" owns the
#       resource.  If this machine is up, it will always have the resource
#       it is shown as owning.
#
#       The string you put in for nodename must match the uname -n name
#       of your machine.  Depending on how you have it administered, it could
#       be a short name or a FQDN.
#
#-------------------------------------------------------------------
#
#       Simple case: One service address, default subnet and netmask
#               No servers that go up and down with the IP address
#
#just.linux-ha.org      135.9.216.110
#
#-------------------------------------------------------------------
#
#       Assuming the adminstrative addresses are on the same subnet...
#       A little more complex case: One service address, default subnet
#       and netmask, and you want to start and stop http when you get
#       the IP address...
#
#just.linux-ha.org      135.9.216.110 http
#-------------------------------------------------------------------
#
#       A little more complex case: Three service addresses, default subnet
#       and netmask, and you want to start and stop http when you get
#       the IP address...
#
#just.linux-ha.org      135.9.216.110 135.9.215.111 135.9.216.112 httpd
#-------------------------------------------------------------------
#
#       One service address, with the subnet, interface and bcast addr
#       explicitly defined.
#
#just.linux-ha.org      135.9.216.3/28/eth0/135.9.216.12 httpd
#
#-------------------------------------------------------------------
#
#       An example where a shared filesystem is to be used.
#       Note that multiple aguments are passed to this script using
#       the delimiter '::' to separate each argument.
#
#node1  10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
#
#       Regarding the node-names in this file:
#
#       They must match the names of the nodes listed in ha.cf, which in turn
#       must match the `uname -n` of some node in the cluster.  So they aren't
#       virtual in any sense of the word.


此时在配置VIP的时候前面的主机名是Master的主机名。备份机上配置的也是Master的主机名。

当我们对以上三个文件配置完毕后,我们就需要将其配置好的文件上传到/etc/ha.d/目录内。
然后我们在两台虚拟机上启动heartbeat:/etc/init.d/heartbeat start
在此一定要关闭防火墙,否则两台主机都会启动,发生裂脑:/etc/init.d/iptables stop
此时我们发现主机为data1,此时我们发现VIP在data1的主机上:



而data2的主机上并没有启动eth0:0。我们关闭data1主机的heartbeat服务时,我们发现eth0:0网卡绑定在data2主机上了:
此时就完成了主备机的切换。这里我们强调一下我们在DNS里面绑定的主机名必须为uname -n的显示的结果。并且在ha.cf文件中绑定主机也要该。

以上我们配置的方式为两边其相同的IP,另一种方式是我们可以为一台机器上配置两个VIP,这样可以
当我们在主heartbeat服务器上执行hb_standby脚本把本地设置为standby,即模拟heartbeat服务器宕机(和停止heartbeat的效果一样),然后看备机接管的情况。这里使用的ha_standby这个脚本,其位于:/usr/lib64/heartbeat/hb_standby
使用命令:/usr/lib64/heartbeat/hb_standby
我们还可以使用另一个脚本:hb_takeover。这个脚本让主机变成,主分发机。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: