Heartbeat
2015-07-23 22:08
1111 查看
Heartbeat
作者:Danbo 时间:2015-7-23通过heartbeat可以将资源(VIP及程序服务等资源)从一台故障计算机快速转移到另一台运转正常的机器继续提供服务。
Heartbeat工作原理:
通过修改heartbeat的配置文件,指定哪一台heartbeat服务器作为主服务器,则另一台将自动成为备份服务器。然后在备份服务器上配置heartbeat守护程序来监听来自主服务器的心跳。如若备份服务器在指定时间内未监听来自主服务器的心跳,就会启动故障转移程序,并取得服务器上的相关资源服务的所有权,接替主服务器继续不间断的提供服务,以达到资源服务高可用的目的。以上是主备模式,还有一种是主主模式(互为主备),双方都发送报文,当指定时间内未收到对方发送的心跳报文,那么就认为对方失效,此时就会接管运行在对方主机上的资源或服务。
Heartbeat的心跳连接
两台heartbeat主机之间的通信一般常用的方法:
1.串行电缆(缺点:距离不能太远)
2.一根以太网电缆(网线)两网卡直连。(最常用)
3.以太网电缆,通过交换机等网络设备连接。(这样增加了交换机故障点,同时线路不是专用心跳,容易受到其他数据传输的影响)
Heartbeat裂脑
什么是裂脑?由于两台高可用服务器在规定的时间内,无法互相检测到对方能心跳而各自启动故障转移功能,取得了资源及服务的所有权,而此时的两台高可用服务器对都还活着并且正在正常运行,这样就会导致同一个IP或服务在两端同时启动而发送冲突的严重问题,最严重的是两天主机占用同一个IP地址,这种情况就被称为裂脑,英文:split brain。
导致裂脑发生的多种原因:
-高可用服务器对之间心跳链路故障,导致无法正常通信。
-高可用服务器对上开启了防火墙阻挡了心跳消息的传输。
-高可用服务器对上心跳网卡地址等信息配置不正确,导致传输失败。
-两端心跳方式不同,心跳广播冲突。
防止举措:
-加冗余线路,两台心跳线路
-检测到裂脑,强行关闭一个心跳节点。需要特殊设备支持比如:Stonith、Fence(可编程电源控制器)等。
-做好对裂脑的监控报警(邮件手机短信等)
-启动磁盘锁。正在服务的一方锁定共享磁盘。正在服务的一方只有发现心跳全部断开后才启动磁盘锁。
-增加仲裁机制:当心跳线完全断开时,两个节点各自ping一下参考网关,那一端不通则表明断点就在本端,能够ping通的一端接管服务,ping不通的一端自动重启。以此释放共享资源。
Heartbeat消息类型
-心跳消息
-集群转换消息
-重传请求
心跳消息:大约为150字节,可能为单播、广播、组播的方式
集群转换消息:ip-request和ip-request-resp
当主服务器恢复在线状态时,通过ip-request消息要求备机释放主服务器失败时备服务器取得的资源,然后备份服务器关闭资源及服务。备机释放资源及服务后,主服务器收到ip-request-resp消息通知主服务器接管资源及服务。
重传请求:rexmit-request控制重传心跳请求。
以上心跳控制消息使用的是UDP协议发送到/etc/ha.d/ha.cf文件指定的任意端口。
Heartbeat IP地址接管和故障转移
Heartbeat IP是通过IP地址接管和ARP广播进行地址转移的。
ARP广播(免费arp):在主服务器故障时,备用节点接管资源后,会立即更新所有客户端本地的ARP表。
注意这里所谓的客户端是指与heartbeat高可用服务器在同一局域网中的客户端,并不是最终的联网用户。客户机是相对于heartbeat高可用服务器来说的。
VIP是虚拟IP,其实就是绑定在物理网卡上的别名IP,如eth0:x,你可以在一块网刊上绑定多个别名,在实际的生产环境中,需要在DNS配置中把网站域名地址解析到这个VIP地址,有这个VIP对用户提供服务。
配置VIP的方法:
heartbeat:ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 up
keepalived:ip addr add 192.168.1.2/24 broadcast 192.168.1.255 dev eth1(辅助IP)
ifconfig 能看到heartbeat配置的VIP,ip add方式能看到以辅助IP方式增加的VIP
删除VIP的方法:
heartbeat:ifconfig eth0:1 192.168.1.2 netmask 255.255.255.0 down 或 ifconfig eth0:1 down
keepalived:ip addr del 192.168.1.2/24 broadcast 192.168.1.255 dev eth1
Heartbeat配置文件
ha.cf heartbeat参数的配置文件
authke beartbeat认证文件
haresourcce heartbeat资源配置文件如IP资源及脚本程序。
部署Heartbeat需求
拓扑图如下:
Heartbeat服务器主机资源规划:
给虚拟机配置IP和主机名
IP配置:
DEVICE=eth0 BOOTPROTO=static BROADCAST=192.168.23.255 HWADDR=00:0C:29:1B:C7:C9 IPADDR=192.168.23.128 IPV6INIT=yes IPV6_AUTOCONF=yes NETMASK=255.255.255.0 NETWORK=192.168.23.0 ONBOOT=yes
我们配置主机名:/etc/sysconfig/network设置HOSTNAME=data1和data2
然后执行hostname data1和data2生效
配置hosts文件:用于解析内网,因为在heartbeat的配置文件中使用了机器名。
echo "172.16.1.129 data1" >>/etc/hosts
echo "172.16.1.132 data1" >>/etc/hosts
此时我们发现启动后系统卡在Starting sendmail处,此时我们需要关闭sendmail服务的自动启动:
killall sendmail
chkconfig --list | grep sendmail
chkconfig --level 2345 sendmail off
添加主机路由,用于心跳通信,两台机器都要添加。
[root@data1 /]# route add -host 10.0.1.132 dev eth2
[root@data1 /]# echo 'route add -host 10.0.1.132 dev eth2' >>/etc/rc.d/rc.local
上一条命令是让路由保存在开机文件中,下次启动自动加入。
安装heartbeat软件
yum安装rpm包后本地不清除安装后的rpm包:
sed -i 's/keepcache=0/keepcache=1/g' /etc/yum.conf
这里我们采用yum方式安装:yum -y install heartbeat
#注意必须要执行两边,第一遍会提醒:
Failed:
heartbeat.x86_64 0:2.1.3-3.el5.centos
-可以自动解决以来关系,就是rpm安装的方式,方面简单,当不灵活,编译方式可以定制安装,灵活,但复杂。不过大公司会自己定制适合自己使用的rpm包,放在yum仓库里。
安装完毕后进入heartbeat的doc的默认目录:cd /usr/share/doc/heartbeat-2.1.3/
我们主要看:ll ha.cf authkeys haresources
cat /ha.cf
首先与日志相关的配置如下所示:
# File to write debug messages to
debugfile /var/log/ha-debug # # File to write other messages to logfile /var/log/ha-log # # Facility to use for syslog()/logger logfacility local0 以下是一些计时器的基础参数一般不改动: keepalive 2 deadtime 30 warntime 10 #发出警告时间 initdead 120 #Heartbeat守护进程首次启动后应该等待120s后再启动主服务器上的资源。 以下行表示使用组播方式,需要改动的有eth2,即改为心跳先的那个网卡: #serial /dev/ttyS0 #采用串口就在这配置。 mcast [dev] [mcast group] [port] [ttl] [loop] mcast group这个每台主机都不能一样。 改为mcast eth2 255.0.0.129 694 1 0 以下行表示当Master宕机后重新恢复回来,即是否抢占模式。 auto_failback on #开启抢占模式 node data1 #即两台存储Server的主机名。这就是我们为什么要修改hosts绑定IP与主机名,必须与route -n显示的一致 node data2 crm no #表示clusters resource management。这个no。 ping 10.10.10.254 #通过ping参考网关的方式来检测HA服务武器是否正常工作,这个可以用于检测裂脑行为。
配置authkey文件
[root@data1 heartbeat-2.1.3]# cat authkeys # # Authentication file. Must be mode 600 #注意这个文件的权限不需为600 # # # Must have exactly one auth directive at the front. # auth send authentication using this method-id # # Then, list the method and key that go with that method-id # # Available methods: crc sha1, md5. Crc doesn't need/want a key.(三种认证方式,CRC的不需要key) # # You normally only have one authentication method-id listed in this file # # Put more than one to make a smooth transition when changing auth # methods and/or keys. # # # sha1 is believed to be the "best", md5 next best. # # crc adds no security, except from packet corruption. # Use only on physically secure networks. # auth 1 #配置哪个有效。 1 crc (默认) #2 sha1 HI! #3 md5 Hello!
配置haresources
配置haresource文件 编辑配置heartbeat资源文件: [root@data1 heartbeat-2.1.3]# cat haresources # # This is a list of resources that move from machine to machine as # nodes go down and come up in the cluster. Do not include # "administrative" or fixed IP addresses in this file. # # <VERY IMPORTANT NOTE> # The haresources files MUST BE IDENTICAL on all nodes of the cluster. # # The node names listed in front of the resource group information # is the name of the preferred node to run the service. It is # not necessarily the name of the current machine. If you are running # auto_failback ON (or legacy), then these services will be started # up on the preferred nodes - any time they're up. # # If you are running with auto_failback OFF, then the node information # will be used in the case of a simultaneous start-up, or when using # the hb_standby {foreign,local} command. # # BUT FOR ALL OF THESE CASES, the haresources files MUST BE IDENTICAL. # If your files are different then almost certainly something # won't work right. # </VERY IMPORTANT NOTE> # # # We refer to this file when we're coming up, and when a machine is being # taken over after going down. # # You need to make this right for your installation, then install it in # /etc/ha.d # # Each logical line in the file constitutes a "resource group". # A resource group is a list of resources which move together from # one node to another - in the order listed. It is assumed that there # is no relationship between different resource groups. These # resource in a resource group are started left-to-right, and stopped # right-to-left. Long lists of resources can be continued from line # to line by ending the lines with backslashes ("\"). # # These resources in this file are either IP addresses, or the name # of scripts to run to "start" or "stop" the given resource. # # The format is like this: # #node-name resource1 resource2 ... resourceN #绑定服务器。 #IPaddr为heartbeat配置IP的默认脚本,其后的IP等都是脚本的参数。 data1 IPaddr::10.1.1.129/24/eth0 #这个为集群对外的VIP,初始启动在data1上,将VIP绑定在eth0上,为heartbeat提供对外服务的通信接口。相当于执行了:/etc/ha.d/resource.d/IPaddr 10.1.1.129/24/eth0 start data2 IPaddr::10.1.1.129/24/eth0 # # # If the resource name contains an :: in the middle of it, the # part after the :: is passed to the resource script as an argument. # Multiple arguments are separated by the :: delimeter # # In the case of IP addresses, the resource script name IPaddr is # implied. # # For example, the IP address 135.9.8.7 could also be represented # as IPaddr::135.9.8.7 # # THIS IS IMPORTANT!! vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv # # The given IP address is directed to an interface which has a route # to the given address. This means you have to have a net route # set up outside of the High-Availability structure. We don't set it # up here -- we key off of it. # # The broadcast address for the IP alias that is created to support # an IP address defaults to the highest address on the subnet. # # The netmask for the IP alias that is created defaults to the same # netmask as the route that it selected in in the step above. # # The base interface for the IPalias that is created defaults to the # same netmask as the route that it selected in in the step above. # # If you want to specify that this IP address is to be brought up # on a subnet with a netmask of 255.255.255.0, you would specify # this as IPaddr::135.9.8.7/24 . # # If you wished to tell it that the broadcast address for this subnet # was 135.9.8.210, then you would specify that this way: # IPaddr::135.9.8.7/24/135.9.8.210 # # If you wished to tell it that the interface to add the address to # is eth0, then you would need to specify it this way: # IPaddr::135.9.8.7/24/eth0 # # And this way to specify both the broadcast address and the # interface: # IPaddr::135.9.8.7/24/eth0/135.9.8.210 # # The IP addresses you list in this file are called "service" addresses, # since they're they're the publicly advertised addresses that clients # use to get at highly available services. # # For a hot/standby (non load-sharing) 2-node system with only # a single service address, # you will probably only put one system name and one IP address in here. # The name you give the address to is the name of the default "hot" # system. # # Where the nodename is the name of the node which "normally" owns the # resource. If this machine is up, it will always have the resource # it is shown as owning. # # The string you put in for nodename must match the uname -n name # of your machine. Depending on how you have it administered, it could # be a short name or a FQDN. # #------------------------------------------------------------------- # # Simple case: One service address, default subnet and netmask # No servers that go up and down with the IP address # #just.linux-ha.org 135.9.216.110 # #------------------------------------------------------------------- # # Assuming the adminstrative addresses are on the same subnet... # A little more complex case: One service address, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 http #------------------------------------------------------------------- # # A little more complex case: Three service addresses, default subnet # and netmask, and you want to start and stop http when you get # the IP address... # #just.linux-ha.org 135.9.216.110 135.9.215.111 135.9.216.112 httpd #------------------------------------------------------------------- # # One service address, with the subnet, interface and bcast addr # explicitly defined. # #just.linux-ha.org 135.9.216.3/28/eth0/135.9.216.12 httpd # #------------------------------------------------------------------- # # An example where a shared filesystem is to be used. # Note that multiple aguments are passed to this script using # the delimiter '::' to separate each argument. # #node1 10.0.0.170 Filesystem::/dev/sda1::/data1::ext2 # # Regarding the node-names in this file: # # They must match the names of the nodes listed in ha.cf, which in turn # must match the `uname -n` of some node in the cluster. So they aren't # virtual in any sense of the word.
此时在配置VIP的时候前面的主机名是Master的主机名。备份机上配置的也是Master的主机名。
当我们对以上三个文件配置完毕后,我们就需要将其配置好的文件上传到/etc/ha.d/目录内。
然后我们在两台虚拟机上启动heartbeat:/etc/init.d/heartbeat start
在此一定要关闭防火墙,否则两台主机都会启动,发生裂脑:/etc/init.d/iptables stop
此时我们发现主机为data1,此时我们发现VIP在data1的主机上:
而data2的主机上并没有启动eth0:0。我们关闭data1主机的heartbeat服务时,我们发现eth0:0网卡绑定在data2主机上了:
此时就完成了主备机的切换。这里我们强调一下我们在DNS里面绑定的主机名必须为uname -n的显示的结果。并且在ha.cf文件中绑定主机也要该。
以上我们配置的方式为两边其相同的IP,另一种方式是我们可以为一台机器上配置两个VIP,这样可以
当我们在主heartbeat服务器上执行hb_standby脚本把本地设置为standby,即模拟heartbeat服务器宕机(和停止heartbeat的效果一样),然后看备机接管的情况。这里使用的ha_standby这个脚本,其位于:/usr/lib64/heartbeat/hb_standby
使用命令:/usr/lib64/heartbeat/hb_standby
我们还可以使用另一个脚本:hb_takeover。这个脚本让主机变成,主分发机。
相关文章推荐
- EasyUI获取DataGrid中某一列的所有值
- 多校第一场 1003
- Linux kernel 分析之十九:阅读源代码技巧-变量命名规则
- hdu 5308 I Wanna Become A 24-Point Master 2015 Multi-University Training Contest 2
- 为何高于四次的方程没有根式解?
- TextView支持的XML属性及相关方法
- 关于使用二分思想算法的时间复杂度的计算
- R语言——1
- 工程下有一个红叉,但是可以照常运行
- html5.0与html4的“区别”
- LRU Cache的简单实现
- Python的高级特性10:无聊的@property
- 2015 HUAS Summer Training#2~C
- 什么是文件路径名?
- Scala深入浅出实战经典之 List伴生对象操作方法代码实战.
- Python个人学习笔记1_初识python_http服务器篇
- Kconfig
- linux c编程访问数据库
- 欧拉工程第51题:Prime digit replacements
- MySQL解压版安装配置