您的位置:首页 > 编程语言 > Java开发

记一次storm异常(提交topology时发生,启动均无异常)--java.net.UnknownHostException

2018-02-12 10:55 696 查看

前言

在搭建storm的生产环境集群的时候发生了一个怪事儿,主nimbus节点启动正常,无报错信息,2台supervisor启动正常,无报错信息,为了检测方便,还在主节点nimbus上启动了相应的ui。【问题来了】每当我提交topology的时候,提交拓扑时的信息没有报错,但是过了2s,查看supervisor.log时,发现两台机器都会报错,同时错误信息过后就会自己宕掉,杀掉进程。为了解决这个问题,基本上是翻阅了各大社区,都木有找到解决问题。这个问题头疼了一星期,最终在今天得以解决,由此记录下。如果后续也有人遇到了相同的问题,可以试着本文的方法试下,说不定会有效。

storm环境介绍

192.168.101.25 zookeeper1 + kafka + storm (supervisor)

192.168.101.26 zookeeper2 + kafka + storm (supervisor)

192.168.101.36 zookeeper3 + kafka

192.168.56.147 storm(nimbus) + storm(ui)

- 192.168.101.25   zookeeper1 + kafka + storm (supervisor)
- 192.168.101.26   zookeeper2 + kafka + storm (supervisor)
- 192.168.101.36   zookeeper3 + kafka
- 192.168.56.147   storm(nimbus) + storm(ui)
如上所示,一共四台服务器,25、26、36上搭建这zookeeper集群,而25、26、147又形成了storm集群。


异常信息

2018-02-11 11:19:27.481 o.a.s.u.NimbusClient Async Localizer [WARN] Ignoring exception while trying to get leader nimbus info from 192.168.56.147. will retry with a different seed host.
java.lang.RuntimeException: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: Phq147
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:108) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.ThriftClient.<init>(ThriftClient.java:69) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.NimbusClient.<init>(NimbusClient.java:127) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:94) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
Caused by: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: Phq147
at org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1]
... 13 more
Caused by: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: PhqApPrdGquery147
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1]
... 13 more
Caused by: java.net.UnknownHostException: Phq147
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) ~[?:1.7.0_80]
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.7.0_80]
at java.net.Socket.connect(Socket.java:579) ~[?:1.7.0_80]
at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1]
... 13 more
2018-02-11 11:19:27.481 o.a.s.l.AsyncLocalizer Async Localizer [WARN] Failed to download basic resources for topology-id servicebus-1-1518319150
2018-02-11 11:19:27.481 o.a.s.d.s.AdvancedFSOps Async Localizer [INFO] Deleting path /karfka/elk/storm/apache-storm-1.1.1/../stormdata/supervisor/tmp/a19cf9f4-b0de-41bc-a06e-299c1046514c
2018-02-11 11:19:27.491 o.a.s.d.s.AdvancedFSOps Async Localizer [INFO] Deleting path /karfka/elk/storm/apache-storm-1.1.1/../stormdata/supervisor/stormdist/servicebus-1-1518319150
2018-02-11 11:19:27.492 o.a.s.l.AsyncLocalizer Async Localizer [WARN] Caught Exception While Downloading (rethrowing)...
org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:111) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
2018-02-11 11:19:27.495 o.a.s.d.s.Slot SLOT_6700 [ERROR] Error when processing event
java.util.concurrent.ExecutionException: org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.7.0_80]
at java.util.concurrent.FutureTask.get(FutureTask.java:202) ~[?:1.7.0_80]
at org.apache.storm.localizer.LocalDownloadedResource$NoCancelFuture.get(LocalDownloadedResource.java:63) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.supervisor.Slot.handleWaitingForBasicLocalization(Slot.java:413) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:273) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:741) ~[storm-core-1.1.1.jar:1.1.1]
Caused by: org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds?
at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:111) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80]
at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
2018-02-11 11:19:27.495 o.a.s.u.Utils SLOT_6700 [ERROR] Halting process: Error when processing an event
java.lang.RuntimeException: Halting process: Error when processing an event
at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773) ~[storm-core-1.1.1.jar:1.1.1]
at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:774) ~[storm-core-1.1.1.jar:1.1.1]
2018-02-11 11:19:27.498 o.a.s.d.s.Supervisor Thread-5 [INFO] Shutting down supervisor a6f91d50-eceb-4f1e-bb99-28799a97d58e
2018-02-11 11:19:27.502 o.a.s.e.EventManagerImp Thread-4 [INFO] Event manager interrupted


解决问题的思路过程

初步思路:

像这种问题,一般引起异常的原因是因为网络问题,所以我第一个想到的就是端口问题,通过supervisor的后台日志进行查看,发现有这么一句

2018-02-11 11:13:26.150 o.a.s.d.s.Supervisor main [INFO] Starting Supervisor with conf {drpc.worker.threads=64, topology.state.synchroni
zation.timeout.secs=60, topology.executor...省略都是一堆配置...}
2018-02-11 11:13:26.227 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6700 Starting in state EMPTY - assignment null
2018-02-11 11:13:26.228 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6701 Starting in state EMPTY - assignment null
2018-02-11 11:13:26.229 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6702 Starting in state EMPTY - assignment null
2018-02-11 11:13:26.229 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6703 Starting in state EMPTY - assignment null


查看到上面的日志时,初步结论是以为storm分发给worker的端口号没有打开,或者说是被supervisor这个台机器的防火墙给拦截了,但是与相应的同事沟通后,发现此机器的防火墙是关闭着的,而通过

netstat -an|grep 6700


抓取端口的时候确实也没有抓取成功,后续查资料发现,worker节点的6700-6703端口是分发工作时才会被打开,所以抓取不到。

跟进思路:

发现SLOT service101.25:6703,为什么这块儿不是我的ip地址,而是被解析成主机名去显示了?随后查看了我25服务器上的hosts文件,如下:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.25  service101.25


原来是被storm自动解析成了主机名,而在我最上面的异常信息中也报的是找不到java.net.UnknownHostException: Phq147,我就在想会不会是我的storm的supervisor配置文件有问题,最初配置文件如下:

#初始
nimbus.seeds: ["192.168.56.147"]


那我改成它报错的主机名如何呢?

#修改后
nimbus.seeds: ["Phq147"]


修改后发现依然不行!继续查看我147服务器的hosts文件发现如下内容:

127.0.0.1       localhost.localdomain localhost
::1     localhost6.localdomain6 localhost6
10.10.56.147    Phq147


原来在147上也做了相应的主机名配置,但是为什么修改后依然访问不通呢!!!???

最终解决

其实要说这个坑为什么这么久才解决,还是因为自己对网络知识有所欠佳,如果发生了上续错误,是因为25supervisor服务器,去向147nimbus服务器请求通信的时候,dns解析失败造成的,网络中,一台服务器像另一台服务器发出请求时,首先会寻找hosts文件下有没有对另一台服务器进行映射,如果有,则优先hosts文件内容为主进行解析,若没有则再通过dns去解析。于是最后轻松解决问题的操作如下:

在25、26两台supervisor的hosts文件中添加上对147的映射即可。

拿25举例:

127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.101.25  service101.25
192.168.56.147 Phq147


至此完成,看来还是需要多了解了解网络才行。。。。。
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐