您的位置：首页 > 其它

基于haproxy 实现spark hiveserver2 ha

2016-08-12 23:14 405 查看

1.hiveserver安装

如果是hiveserver是基于hive的需要拷贝hive-site.xml文件到spark/conf目录下

hs启动命令:

/home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/sbin/start-thriftserver.sh --deploy-mode client --hiveconf hive.server2.thrift.port=10002 --hiveconf hive.server2.thrift.bind.host=dc-dev004.dx --driver-class-path /home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/lib/mysql-connector-java.jar --executor-memory 8g --driver-memory 3g --total-executor-cores 15 --executor-cores 5 --name hiveserver1 --conf spark.scheduler.mode=FAIR

/home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/sbin/start-thriftserver.sh --deploy-mode client --hiveconf hive.server2.thrift.port=10002  --hiveconf hive.server2.thrift.bind.host=dc-dev005.dx --driver-class-path /home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/lib/mysql-connector-java.jar --executor-memory 8g --driver-memory 3g --total-executor-cores 15 --executor-cores 5 --name hiveserver2 --conf spark.scheduler.mode=FAIR

2.haproxy安装配置

2.1简介

HAProxy是一款提供高可用性、负载均衡以及基于TCP（第四层）和HTTP（第七层）应用的代理软件，HAProxy是完全免费的、借助HAProxy可以快速并且可靠的提供基于TCP和HTTP应用的代理解决方案。

免费开源，稳定性也是非常好，这个可通过我做的一些小项目可以看出来，单Haproxy也跑得不错，稳定性可以与硬件级的F5相媲美。根据官方文档，HAProxy可以跑满10Gbps-New benchmark of HAProxy at 10 Gbps using Myricom's 10GbE NICs （Myri-10G PCI-Express），这个数值作为软件级负载均衡器是相当惊人的。

HAProxy 支持连接拒绝 : 因为维护一个连接的打开的开销是很低的，有时我们很需要限制攻击蠕虫（attack bots），也就是说限制它们的连接打开从而限制它们的危害。这个已经为一个陷于小型DDoS攻击的网站开发了而且已经拯救了很多站点，这个优点也是其它负载均衡器没有的。

HAProxy 支持全透明代理（已具备硬件防火墙的典型特点）: 可以用客户端IP地址或者任何其他地址来连接后端服务器. 这个特性仅在Linux 2.4/2.6内核打了cttproxy补丁后才可以使用. 这个特性也使得为某特殊服务器处理部分流量同时又不修改服务器的地址成为可能。

HAProxy现多于线上的Mysql集群环境，我们常用于它作为MySQL（读）负载均衡；

自带强大的监控服务器状态的页面，实际环境中我们结合Nagios进行邮件或短信报警，这个也是我非常喜欢它的原因之一；

HAProxy支持虚拟主机，许多朋友说它不支持虚拟主机是错误的，通过测试我们知道，HAProxy是支持虚拟主机的。

HAProxy特别适用于那些负载特大的web站点，这些站点通常又需要会话保持或七层处理。HAProxy运行在当前的硬件上，完全可以支持数以万计的并发连接。并且它的运行模式使得它可以很简单安全的整合进您当前的架构中，同时可以保护你的web服务器不被暴露到网络上。

2.2

安装

wget http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.24.tar.gz|tar zxvf
mv haproxy-1.4.24 /opt/haproxy-1.4.24
cd /opt/haproxy-1.4.24
make TARGET=linux26

配置文件

global
daemon
nbproc 1
pidfile /opt/haproxy-1.4.24/haproxy.pid
ulimit-n 65535

defaults
mode tcp                        #mode { tcp|http|health }，tcp 表示4层，http表示7层，health仅作为健康检查使用
retries 2                       #尝试2次失败则从集群摘除
option redispatch               #如果失效则强制转换其他服务器
option abortonclose             #连接数过大自动关闭
maxconn 1024                    #最大连接数
timeout connect 1d              #连接超时时间，重要，hive查询数据能返回结果的保证
timeout client 1d               #同上
timeout server 1d               #同上
timeout check 2000              #健康检查时间
log 127.0.0.1 local0 err 	#[err warning info debug]

listen  admin_stats                     #定义管理界面
bind 0.0.0.0:8040               #管理界面访问IP和端口
mode http                       #管理界面所使用的协议
maxconn 10          		#最大连接数
stats refresh 30s               #30秒自动刷新
stats uri /                     #访问url
stats realm Hive\ Haproxy       #验证窗口提示
stats auth dc:dc         #401验证用户名密码

listen hive             #hive后端定义
bind 0.0.0.0:10000              #ha作为proxy所绑定的IP和端口
mode tcp                        #以4层方式代理，重要
balance leastconn               #调度算法 'leastconn' 最少连接数分配，或者 'roundrobin'，轮询分配
maxconn 1024                    #最大连接数
server hive_1 dc-dev004.dx.momo.com:10002 check inter 180000 rise 1 fall 2
server hive_2 dc-dev005.dx.momo.com:10002 check inter 180000 rise 1 fall 2
#释义：server 主机代名(你自己能看懂就行)，IP:端口 每180000毫秒检查一次。也就是三分钟。
#hive每有10000端口的请求就会创建一个log，设置短了，/tmp下面会有无数个log文件，删不完。

2.3启动

./haproxy -f config.cfg

3.配置hiveserver以linux服务形式运行

3.1代码

#!/usr/bin/env python

import sys, os, time, atexit, string,socket
from signal import SIGTERM

class Daemon:
def __init__(self, pidfile, stdin='/dev/null', stdout='/dev/null', stderr='/dev/null'):
self.stdin = stdin
self.stdout = stdout
self.stderr = stderr
self.pidfile = pidfile

def _daemonize(self):
try:
pid = os.fork()
if pid > 0:
sys.exit(0)
except OSError, e:
sys.stderr.write('fork #1 failed: %d (%s)\n' % (e.errno, e.strerror))
sys.exit(1)

os.chdir("/")
os.setsid()
os.umask(0)

try:
pid = os.fork()
if pid > 0:
sys.exit(0)
except OSError, e:
sys.stderr.write('fork #2 failed: %d (%s)\n' % (e.errno, e.strerror))
sys.exit(1)

sys.stdout.flush()
sys.stderr.flush()
si = file(self.stdin, 'r')
so = file(self.stdout, 'a+')
se = file(self.stderr, 'a+', 0)
os.dup2(si.fileno(), sys.stdin.fileno())
os.dup2(so.fileno(), sys.stdout.fileno())
os.dup2(se.fileno(), sys.stderr.fileno())

atexit.register(self.delpid)
pid = str(os.getpid())
file(self.pidfile,'w+').write('%s\n' % pid)

def delpid(self):
os.remove(self.pidfile)

def start(self):
try:
pf = file(self.pidfile,'r')
pid = int(pf.read().strip())
pf.close()
except IOError:
pid = None

if pid:
message = 'pidfile %s already exist. Daemon already running?\n'
sys.stderr.write(message % self.pidfile)
sys.exit(1)

self._daemonize()
self._run()

def stop(self):
try:
pf = file(self.pidfile,'r')
pid = int(pf.read().strip())
pf.close()
except IOError:
pid = None

if not pid:
message = 'pidfile %s does not exist. Daemon not running?\n'
sys.stderr.write(message % self.pidfile)
return

try:
while 1:
os.kill(pid, SIGTERM)
time.sleep(1)
os.system("kill -9 `ps -ef| grep java| grep hive |grep thrift|grep hiveserver| grep -v 'grep' | awk '{print $2}'`")
except OSError, err:
err = str(err)
if err.find('No such process') > 0:
if os.path.exists(self.pidfile):
os.remove(self.pidfile)
else:
print str(err)
sys.exit(1)

def restart(self):
self.stop()
self.start()

def _run(self):
while True:
process = os.popen('ps aux|grep java|grep hive|grep thrift|grep hiveserver|grep -v "grep"|wc -l').read().strip()
#port = os.popen("netstat -na | grep tcp | grep 10002 | awk '{print $4}'| grep 10002 | grep -v 'grep' |wc -l").read().strip()
port = 1
sys.stdout.write("precess %s" % (process))
if process == '0':
sys.stdout.write("precess %s" % (process))
os.system('/home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/sbin/start-thriftserver.sh --deploy-mode client --hiveconf hive.server2.thrift.port=10002  --hiveconf hive.server2.thrift.bind.host=%s --driver-class-path /home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/lib/mysql-connector-java.jar --executor-memory 8g --driver-memory 3g --total-executor-cores 15 --executor-cores 5 --name hiveserver1 --conf spark.scheduler.mode=FAIR' % (socket.gethostname()))
if port == '0':
os.system('/home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/sbin/start-thriftserver.sh --deploy-mode client --hiveconf hive.server2.thrift.port=10002  --hiveconf hive.server2.thrift.bind.host=%s --driver-class-path /home/dc/datacenter/soft/spark/spark-1.6.1-bin-2.6.0/lib/mysql-connector-java.jar --executor-memory 8g --driver-memory 3g --total-executor-cores 15 --executor-cores 5 --name hiveserver1 --conf spark.scheduler.mode=FAIR' % (socket.gethostname()))
time.sleep(30)

if __name__ == '__main__':
daemon = Daemon('/home/dc/datacenter/soft/hsdaemon/watch_process.pid',stdout='/home/dc/datacenter/soft/hsdaemon/stdout.log',stderr='/home/dc/datacenter/soft/hsdaemon/stderr.log')
if len(sys.argv) == 2:
if 'start' == sys.argv[1]:
daemon.start()
elif 'stop' == sys.argv[1]:
daemon.stop()
elif 'restart' == sys.argv[1]:
daemon.restart()
else:
print 'Unknown command'
sys.exit(2)
sys.exit(0)
else:
print 'usage: %s start|stop|restart' % sys.argv[0]
sys.exit(2)

3.2启动

需要在两台hiveserver分别启动

python hsdaemon.py start

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： spark hive

相关文章推荐

新的分享

章节导航