您的位置：首页 > 其它

Heartbeat

2015-07-23 22:08 1111 查看

Heartbeat

作者：Danbo 时间：2015-7-23

通过heartbeat可以将资源（VIP及程序服务等资源）从一台故障计算机快速转移到另一台运转正常的机器继续提供服务。

Heartbeat工作原理：
通过修改heartbeat的配置文件，指定哪一台heartbeat服务器作为主服务器，则另一台将自动成为备份服务器。然后在备份服务器上配置heartbeat守护程序来监听来自主服务器的心跳。如若备份服务器在指定时间内未监听来自主服务器的心跳，就会启动故障转移程序，并取得服务器上的相关资源服务的所有权，接替主服务器继续不间断的提供服务，以达到资源服务高可用的目的。以上是主备模式，还有一种是主主模式（互为主备），双方都发送报文，当指定时间内未收到对方发送的心跳报文，那么就认为对方失效，此时就会接管运行在对方主机上的资源或服务。

Heartbeat的心跳连接
两台heartbeat主机之间的通信一般常用的方法：
1.串行电缆（缺点：距离不能太远）
2.一根以太网电缆（网线）两网卡直连。（最常用）
3.以太网电缆，通过交换机等网络设备连接。（这样增加了交换机故障点，同时线路不是专用心跳，容易受到其他数据传输的影响）

Heartbeat裂脑
什么是裂脑？由于两台高可用服务器在规定的时间内，无法互相检测到对方能心跳而各自启动故障转移功能，取得了资源及服务的所有权，而此时的两台高可用服务器对都还活着并且正在正常运行，这样就会导致同一个IP或服务在两端同时启动而发送冲突的严重问题，最严重的是两天主机占用同一个IP地址，这种情况就被称为裂脑，英文：split brain。

导致裂脑发生的多种原因：
-高可用服务器对之间心跳链路故障，导致无法正常通信。
-高可用服务器对上开启了防火墙阻挡了心跳消息的传输。
-高可用服务器对上心跳网卡地址等信息配置不正确，导致传输失败。
-两端心跳方式不同，心跳广播冲突。

防止举措：
-加冗余线路，两台心跳线路
-检测到裂脑，强行关闭一个心跳节点。需要特殊设备支持比如：Stonith、Fence（可编程电源控制器）等。
-做好对裂脑的监控报警（邮件手机短信等）
-启动磁盘锁。正在服务的一方锁定共享磁盘。正在服务的一方只有发现心跳全部断开后才启动磁盘锁。
-增加仲裁机制：当心跳线完全断开时，两个节点各自ping一下参考网关，那一端不通则表明断点就在本端，能够ping通的一端接管服务，ping不通的一端自动重启。以此释放共享资源。

Heartbeat消息类型
-心跳消息
-集群转换消息
-重传请求

心跳消息：大约为150字节，可能为单播、广播、组播的方式
集群转换消息：ip-request和ip-request-resp
当主服务器恢复在线状态时，通过ip-request消息要求备机释放主服务器失败时备服务器取得的资源，然后备份服务器关闭资源及服务。备机释放资源及服务后，主服务器收到ip-request-resp消息通知主服务器接管资源及服务。
重传请求：rexmit-request控制重传心跳请求。

以上心跳控制消息使用的是UDP协议发送到/etc/ha.d/ha.cf文件指定的任意端口。

Heartbeat IP地址接管和故障转移
Heartbeat IP是通过IP地址接管和ARP广播进行地址转移的。
ARP广播（免费arp）：在主服务器故障时，备用节点接管资源后，会立即更新所有客户端本地的ARP表。
注意这里所谓的客户端是指与heartbeat高可用服务器在同一局域网中的客户端，并不是最终的联网用户。客户机是相对于heartbeat高可用服务器来说的。

VIP是虚拟IP，其实就是绑定在物理网卡上的别名IP，如eth0:x，你可以在一块网刊上绑定多个别名，在实际的生产环境中，需要在DNS配置中把网站域名地址解析到这个VIP地址，有这个VIP对用户提供服务。

配置VIP的方法：
heartbeat：ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 up
keepalived：ip addr add 192.168.1.2/24 broadcast 192.168.1.255 dev eth1（辅助IP）
ifconfig 能看到heartbeat配置的VIP，ip add方式能看到以辅助IP方式增加的VIP

删除VIP的方法：
heartbeat：ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 down 或 ifconfig eth0:1 down
keepalived：ip addr del 192.168.1.2/24 broadcast 192.168.1.255 dev eth1

Heartbeat配置文件
ha.cf heartbeat参数的配置文件
authke beartbeat认证文件
haresourcce heartbeat资源配置文件如IP资源及脚本程序。

部署Heartbeat需求
拓扑图如下：

Heartbeat服务器主机资源规划：

给虚拟机配置IP和主机名
IP配置:

DEVICE=eth0
BOOTPROTO=static
BROADCAST=192.168.23.255
HWADDR=00:0C:29:1B:C7:C9
IPADDR=192.168.23.128
IPV6INIT=yes
IPV6_AUTOCONF=yes
NETMASK=255.255.255.0
NETWORK=192.168.23.0
ONBOOT=yes

　　

我们配置主机名：/etc/sysconfig/network设置HOSTNAME=data1和data2
然后执行hostname data1和data2生效

配置hosts文件：用于解析内网，因为在heartbeat的配置文件中使用了机器名。
echo "172.16.1.129 data1" >>/etc/hosts
echo "172.16.1.132 data1" >>/etc/hosts

此时我们发现启动后系统卡在Starting sendmail处，此时我们需要关闭sendmail服务的自动启动:
killall sendmail
chkconfig --list | grep sendmail
chkconfig --level 2345 sendmail off

添加主机路由，用于心跳通信，两台机器都要添加。
[root@data1 /]# route add -host 10.0.1.132 dev eth2
[root@data1 /]# echo 'route add -host 10.0.1.132 dev eth2' >>/etc/rc.d/rc.local
上一条命令是让路由保存在开机文件中，下次启动自动加入。

安装heartbeat软件

yum安装rpm包后本地不清除安装后的rpm包：
sed -i 's/keepcache=0/keepcache=1/g' /etc/yum.conf
这里我们采用yum方式安装：yum -y install heartbeat
#注意必须要执行两边，第一遍会提醒：
Failed:
heartbeat.x86_64 0:2.1.3-3.el5.centos
-可以自动解决以来关系，就是rpm安装的方式，方面简单，当不灵活，编译方式可以定制安装，灵活，但复杂。不过大公司会自己定制适合自己使用的rpm包，放在yum仓库里。
安装完毕后进入heartbeat的doc的默认目录：cd /usr/share/doc/heartbeat-2.1.3/
我们主要看：ll ha.cf authkeys haresources

cat /ha.cf
首先与日志相关的配置如下所示：

# File to write debug messages to

debugfile /var/log/ha-debug
#
#  File to write other messages to
logfile /var/log/ha-log
#
# Facility to use for syslog()/logger
logfacility local0

以下是一些计时器的基础参数一般不改动：
keepalive 2
deadtime 30
warntime 10      #发出警告时间
initdead 120     #Heartbeat守护进程首次启动后应该等待120s后再启动主服务器上的资源。

以下行表示使用组播方式，需要改动的有eth2，即改为心跳先的那个网卡：
#serial /dev/ttyS0 #采用串口就在这配置。
mcast [dev] [mcast group] [port] [ttl] [loop]   mcast group这个每台主机都不能一样。
改为mcast eth2 255.0.0.129 694 1 0

以下行表示当Master宕机后重新恢复回来，即是否抢占模式。
auto_failback on   #开启抢占模式
node    data1   #即两台存储Server的主机名。这就是我们为什么要修改hosts绑定IP与主机名，必须与route -n显示的一致
node    data2

crm     no  #表示clusters resource management。这个no。
ping    10.10.10.254    #通过ping参考网关的方式来检测HA服务武器是否正常工作，这个可以用于检测裂脑行为。

　　

配置authkey文件

[root@data1 heartbeat-2.1.3]# cat authkeys
#
#       Authentication file.  Must be mode 600   #注意这个文件的权限不需为600
#
#
#       Must have exactly one auth directive at the front.
#       auth    send authentication using this method-id
#
#       Then, list the method and key that go with that method-id
#
#       Available methods: crc sha1, md5.  Crc doesn't need/want a key.(三种认证方式，CRC的不需要key)
#
#       You normally only have one authentication method-id listed in this file
#
#       Put more than one to make a smooth transition when changing auth
#       methods and/or keys.
#
#
#       sha1 is believed to be the "best", md5 next best.
#
#       crc adds no security, except from packet corruption.
#               Use only on physically secure networks.
#
auth 1  #配置哪个有效。
1 crc  (默认)
#2 sha1 HI!
#3 md5 Hello!

配置haresources

配置haresource文件
编辑配置heartbeat资源文件：
[root@data1 heartbeat-2.1.3]# cat haresources
#
# This is a list of resources that move from machine to machine as
# nodes go down and come up in the cluster. Do not include
# "administrative" or fixed IP addresses in this file.
#
# <VERY IMPORTANT NOTE>
# The haresources files MUST BE IDENTICAL on all nodes of the cluster.
#
# The node names listed in front of the resource group information
# is the name of the preferred node to run the service. It is
# not necessarily the name of the current machine. If you are running
# auto_failback ON (or legacy), then these services will be started
# up on the preferred nodes - any time they're up.
#
# If you are running with auto_failback OFF, then the node information
# will be used in the case of a simultaneous start-up, or when using
# the hb_standby {foreign,local} command.
#
# BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL.
# If your files are different then almost certainly something
# won't work right.
# </VERY IMPORTANT NOTE>
#
#
# We refer to this file when we're coming up, and when a machine is being
# taken over after going down.
#
# You need to make this right for your installation, then install it in
# /etc/ha.d
#
# Each logical line in the file constitutes a "resource group".
# A resource group is a list of resources which move together from
# one node to another - in the order listed. It is assumed that there
# is no relationship between different resource groups. These
# resource in a resource group are started left-to-right, and stopped
# right-to-left. Long lists of resources can be continued from line
# to line by ending the lines with backslashes ("\").
#
# These resources in this file are either IP addresses, or the name
# of scripts to run to "start" or "stop" the given resource.
#
# The format is like this:
#
#node-name resource1 resource2 ... resourceN #绑定服务器。
#IPaddr为heartbeat配置IP的默认脚本，其后的IP等都是脚本的参数。
data1 IPaddr::10.1.1.129/24/eth0 #这个为集群对外的VIP，初始启动在data1上，将VIP绑定在eth0上，为heartbeat提供对外服务的通信接口。相当于执行了：/etc/ha.d/resource.d/IPaddr 10.1.1.129/24/eth0 start
data2 IPaddr::10.1.1.129/24/eth0
#
#
# If the resource name contains an :: in the middle of it, the
# part after the :: is passed to the resource script as an argument.
# Multiple arguments are separated by the :: delimeter
#
# In the case of IP addresses, the resource script name IPaddr is
# implied.
#
# For example, the IP address 135.9.8.7 could also be represented
# as IPaddr::135.9.8.7
#
# THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
#
# The given IP address is directed to an interface which has a route
# to the given address. This means you have to have a net route
# set up outside of the High-Availability structure. We don't set it
# up here -- we key off of it.
#
# The broadcast address for the IP alias that is created to support
# an IP address defaults to the highest address on the subnet.
#
# The netmask for the IP alias that is created defaults to the same
# netmask as the route that it selected in in the step above.
#
# The base interface for the IPalias that is created defaults to the
# same netmask as the route that it selected in in the step above.
#
# If you want to specify that this IP address is to be brought up
# on a subnet with a netmask of 255.255.255.0, you would specify
# this as IPaddr::135.9.8.7/24 .
#
# If you wished to tell it that the broadcast address for this subnet
# was 135.9.8.210, then you would specify that this way:
# IPaddr::135.9.8.7/24/135.9.8.210
#
# If you wished to tell it that the interface to add the address to
# is eth0, then you would need to specify it this way:
# IPaddr::135.9.8.7/24/eth0
#
# And this way to specify both the broadcast address and the
# interface:
# IPaddr::135.9.8.7/24/eth0/135.9.8.210
#
# The IP addresses you list in this file are called "service" addresses,
# since they're they're the publicly advertised addresses that clients
# use to get at highly available services.
#
# For a hot/standby (non load-sharing) 2-node system with only
# a single service address,
# you will probably only put one system name and one IP address in here.
# The name you give the address to is the name of the default "hot"
# system.
#
# Where the nodename is the name of the node which "normally" owns the
# resource. If this machine is up, it will always have the resource
# it is shown as owning.
#
# The string you put in for nodename must match the uname -n name
# of your machine. Depending on how you have it administered, it could
# be a short name or a FQDN.
#
#-------------------------------------------------------------------
#
# Simple case: One service address, default subnet and netmask
# No servers that go up and down with the IP address
#
#just.linux-ha.org 135.9.216.110
#
#-------------------------------------------------------------------
#
# Assuming the adminstrative addresses are on the same subnet...
# A little more complex case: One service address, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 http
#-------------------------------------------------------------------
#
# A little more complex case: Three service addresses, default subnet
# and netmask, and you want to start and stop http when you get
# the IP address...
#
#just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd
#-------------------------------------------------------------------
#
# One service address, with the subnet, interface and bcast addr
# explicitly defined.
#
#just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd
#
#-------------------------------------------------------------------
#
# An example where a shared filesystem is to be used.
# Note that multiple aguments are passed to this script using
# the delimiter '::' to separate each argument.
#
#node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2
#
# Regarding the node-names in this file:
#
# They must match the names of the nodes listed in ha.cf, which in turn
# must match the `uname -n` of some node in the cluster. So they aren't
# virtual in any sense of the word.

此时在配置VIP的时候前面的主机名是Master的主机名。备份机上配置的也是Master的主机名。

当我们对以上三个文件配置完毕后，我们就需要将其配置好的文件上传到/etc/ha.d/目录内。
然后我们在两台虚拟机上启动heartbeat:/etc/init.d/heartbeat start
在此一定要关闭防火墙，否则两台主机都会启动，发生裂脑：/etc/init.d/iptables stop
此时我们发现主机为data1，此时我们发现VIP在data1的主机上：

而data2的主机上并没有启动eth0:0。我们关闭data1主机的heartbeat服务时，我们发现eth0:0网卡绑定在data2主机上了：
此时就完成了主备机的切换。这里我们强调一下我们在DNS里面绑定的主机名必须为uname -n的显示的结果。并且在ha.cf文件中绑定主机也要该。

以上我们配置的方式为两边其相同的IP，另一种方式是我们可以为一台机器上配置两个VIP，这样可以
当我们在主heartbeat服务器上执行hb_standby脚本把本地设置为standby，即模拟heartbeat服务器宕机（和停止heartbeat的效果一样），然后看备机接管的情况。这里使用的ha_standby这个脚本，其位于：/usr/lib64/heartbeat/hb_standby
使用命令：/usr/lib64/heartbeat/hb_standby
我们还可以使用另一个脚本：hb_takeover。这个脚本让主机变成，主分发机。

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航