记一次storm异常(提交topology时发生,启动均无异常)--java.net.UnknownHostException
2018-02-12 10:55
696 查看
前言
在搭建storm的生产环境集群的时候发生了一个怪事儿,主nimbus节点启动正常,无报错信息,2台supervisor启动正常,无报错信息,为了检测方便,还在主节点nimbus上启动了相应的ui。【问题来了】每当我提交topology的时候,提交拓扑时的信息没有报错,但是过了2s,查看supervisor.log时,发现两台机器都会报错,同时错误信息过后就会自己宕掉,杀掉进程。为了解决这个问题,基本上是翻阅了各大社区,都木有找到解决问题。这个问题头疼了一星期,最终在今天得以解决,由此记录下。如果后续也有人遇到了相同的问题,可以试着本文的方法试下,说不定会有效。storm环境介绍
192.168.101.25 zookeeper1 + kafka + storm (supervisor)192.168.101.26 zookeeper2 + kafka + storm (supervisor)
192.168.101.36 zookeeper3 + kafka
192.168.56.147 storm(nimbus) + storm(ui)
- 192.168.101.25 zookeeper1 + kafka + storm (supervisor) - 192.168.101.26 zookeeper2 + kafka + storm (supervisor) - 192.168.101.36 zookeeper3 + kafka - 192.168.56.147 storm(nimbus) + storm(ui) 如上所示,一共四台服务器,25、26、36上搭建这zookeeper集群,而25、26、147又形成了storm集群。
异常信息
2018-02-11 11:19:27.481 o.a.s.u.NimbusClient Async Localizer [WARN] Ignoring exception while trying to get leader nimbus info from 192.168.56.147. will retry with a different seed host. java.lang.RuntimeException: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: Phq147 at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:108) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.ThriftClient.<init>(ThriftClient.java:69) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.NimbusClient.<init>(NimbusClient.java:127) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:94) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80] Caused by: java.lang.RuntimeException: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: Phq147 at org.apache.storm.security.auth.TBackoffConnect.retryNext(TBackoffConnect.java:64) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:56) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1] ... 13 more Caused by: org.apache.storm.thrift.transport.TTransportException: java.net.UnknownHostException: PhqApPrdGquery147 at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:226) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1] ... 13 more Caused by: java.net.UnknownHostException: Phq147 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) ~[?:1.7.0_80] at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) ~[?:1.7.0_80] at java.net.Socket.connect(Socket.java:579) ~[?:1.7.0_80] at org.apache.storm.thrift.transport.TSocket.open(TSocket.java:221) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.thrift.transport.TFramedTransport.open(TFramedTransport.java:81) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.SimpleTransportPlugin.connect(SimpleTransportPlugin.java:105) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.TBackoffConnect.doConnectWithRetry(TBackoffConnect.java:53) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.security.auth.ThriftClient.reconnect(ThriftClient.java:100) ~[storm-core-1.1.1.jar:1.1.1] ... 13 more 2018-02-11 11:19:27.481 o.a.s.l.AsyncLocalizer Async Localizer [WARN] Failed to download basic resources for topology-id servicebus-1-1518319150 2018-02-11 11:19:27.481 o.a.s.d.s.AdvancedFSOps Async Localizer [INFO] Deleting path /karfka/elk/storm/apache-storm-1.1.1/../stormdata/supervisor/tmp/a19cf9f4-b0de-41bc-a06e-299c1046514c 2018-02-11 11:19:27.491 o.a.s.d.s.AdvancedFSOps Async Localizer [INFO] Deleting path /karfka/elk/storm/apache-storm-1.1.1/../stormdata/supervisor/stormdist/servicebus-1-1518319150 2018-02-11 11:19:27.492 o.a.s.l.AsyncLocalizer Async Localizer [WARN] Caught Exception While Downloading (rethrowing)... org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds? at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:111) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80] 2018-02-11 11:19:27.495 o.a.s.d.s.Slot SLOT_6700 [ERROR] Error when processing event java.util.concurrent.ExecutionException: org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds? at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.7.0_80] at java.util.concurrent.FutureTask.get(FutureTask.java:202) ~[?:1.7.0_80] at org.apache.storm.localizer.LocalDownloadedResource$NoCancelFuture.get(LocalDownloadedResource.java:63) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.supervisor.Slot.handleWaitingForBasicLocalization(Slot.java:413) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.supervisor.Slot.stateMachineStep(Slot.java:273) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:741) ~[storm-core-1.1.1.jar:1.1.1] Caused by: org.apache.storm.utils.NimbusLeaderNotFoundException: Could not find leader nimbus from seed hosts [192.168.56.147]. Did you specify a valid list of nimbus hosts for config nimbus.seeds? at org.apache.storm.utils.NimbusClient.getConfiguredClientAs(NimbusClient.java:111) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.NimbusClient.getConfiguredClient(NimbusClient.java:57) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.blobstore.NimbusBlobStore.prepare(NimbusBlobStore.java:268) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.utils.Utils.getClientBlobStoreForSupervisor(Utils.java:538) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.downloadBaseBlobs(AsyncLocalizer.java:121) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:148) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.localizer.AsyncLocalizer$DownloadBaseBlobsDistributed.call(AsyncLocalizer.java:101) ~[storm-core-1.1.1.jar:1.1.1] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [?:1.7.0_80] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [?:1.7.0_80] at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80] 2018-02-11 11:19:27.495 o.a.s.u.Utils SLOT_6700 [ERROR] Halting process: Error when processing an event java.lang.RuntimeException: Halting process: Error when processing an event at org.apache.storm.utils.Utils.exitProcess(Utils.java:1773) ~[storm-core-1.1.1.jar:1.1.1] at org.apache.storm.daemon.supervisor.Slot.run(Slot.java:774) ~[storm-core-1.1.1.jar:1.1.1] 2018-02-11 11:19:27.498 o.a.s.d.s.Supervisor Thread-5 [INFO] Shutting down supervisor a6f91d50-eceb-4f1e-bb99-28799a97d58e 2018-02-11 11:19:27.502 o.a.s.e.EventManagerImp Thread-4 [INFO] Event manager interrupted
解决问题的思路过程
初步思路:像这种问题,一般引起异常的原因是因为网络问题,所以我第一个想到的就是端口问题,通过supervisor的后台日志进行查看,发现有这么一句
2018-02-11 11:13:26.150 o.a.s.d.s.Supervisor main [INFO] Starting Supervisor with conf {drpc.worker.threads=64, topology.state.synchroni zation.timeout.secs=60, topology.executor...省略都是一堆配置...} 2018-02-11 11:13:26.227 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6700 Starting in state EMPTY - assignment null 2018-02-11 11:13:26.228 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6701 Starting in state EMPTY - assignment null 2018-02-11 11:13:26.229 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6702 Starting in state EMPTY - assignment null 2018-02-11 11:13:26.229 o.a.s.d.s.Slot main [WARN] SLOT service101.25:6703 Starting in state EMPTY - assignment null
查看到上面的日志时,初步结论是以为storm分发给worker的端口号没有打开,或者说是被supervisor这个台机器的防火墙给拦截了,但是与相应的同事沟通后,发现此机器的防火墙是关闭着的,而通过
netstat -an|grep 6700
抓取端口的时候确实也没有抓取成功,后续查资料发现,worker节点的6700-6703端口是分发工作时才会被打开,所以抓取不到。
跟进思路:
发现SLOT service101.25:6703,为什么这块儿不是我的ip地址,而是被解析成主机名去显示了?随后查看了我25服务器上的hosts文件,如下:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.101.25 service101.25
原来是被storm自动解析成了主机名,而在我最上面的异常信息中也报的是找不到java.net.UnknownHostException: Phq147,我就在想会不会是我的storm的supervisor配置文件有问题,最初配置文件如下:
#初始 nimbus.seeds: ["192.168.56.147"]
那我改成它报错的主机名如何呢?
#修改后 nimbus.seeds: ["Phq147"]
修改后发现依然不行!继续查看我147服务器的hosts文件发现如下内容:
127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 10.10.56.147 Phq147
原来在147上也做了相应的主机名配置,但是为什么修改后依然访问不通呢!!!???
最终解决
其实要说这个坑为什么这么久才解决,还是因为自己对网络知识有所欠佳,如果发生了上续错误,是因为25supervisor服务器,去向147nimbus服务器请求通信的时候,dns解析失败造成的,网络中,一台服务器像另一台服务器发出请求时,首先会寻找hosts文件下有没有对另一台服务器进行映射,如果有,则优先hosts文件内容为主进行解析,若没有则再通过dns去解析。于是最后轻松解决问题的操作如下:在25、26两台supervisor的hosts文件中添加上对147的映射即可。
拿25举例:
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.101.25 service101.25
192.168.56.147 Phq147
至此完成,看来还是需要多了解了解网络才行。。。。。
相关文章推荐
- Tomcat在Linux环境下启动异常 java.net.UnknownHostException: HP_MAIN12: HP_MAIN12
- spark 提交jar包到集群运行报异常 java.net.UnknownHostException: mycluster
- 启动TOMCAT发生了java.net.unknownhostException
- 不联网的情况下启动TOMCAT发生了java.net.unknownhostException
- CentOS Hadoop格式化HDFS异常java.net.UnknownHostException
- Mac上启动server报Caused by: java.net.UnknownHostException:*.local:*.local解决办法
- LINUX安装tomcat 启动报异常 Protocol handler pause failed java.net.NoRouteToHostException: No route to host
- Linux出现java.net.UnknownHostException异常问题的解决办法
- java.net.UnknownHostException 异常处理
- java.net.UnknownHostException: unknown host:xxxx异常解决办法
- 启动glassfish出现java.net.UnknownHostException: htm: 未知的名称或服务解决方案
- android网络连接httpGet,遇到java.net.UnknownHostException: Host is unresolved 异常
- Linux出现java.net.UnknownHostException异常问题的解决办法
- Tomcat启动出错 java.net.UnknownHostException
- linux 下 启动web项目报 java.net.UnknownHostException
- hadoo namenode format 异常 java.net.UnknownHostException: localhost.localdomain: localhost.localdomain
- 代理抛出异常错误: java.net.MalformedURLException: Local host name unknown: java.net.UnknownHostException:
- linux下启动tomcat的时候提示java.net.UnknownHostException的解决办法
- java.net.UnknownHostException: unknown host:xxxx异常解决办法
- Tomcat突然启动不了: java.net.UnknownHostException