vertica-三节点集群,一台机器故障,所有文件丢失。的恢复
2016-07-12 12:40
706 查看
数据文件夹全部丢失,此时在节点上重装软件,数据库识别该失败节点,但是无法启动或者删除该节点,因为缺少核心的媒介spread.conf 文件
重启这个节点报错
"{0}. Error was: {1}.".format(repr(msg), e))
ATReceiveFailure_Init: Problem json decoding message '{"status": null, "content": {"special_environment": null}, "error_type": null, "error_message": null, "exec_stack": null}'. Error was: None is not a valid status.
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir EmBigData_dev
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba EmBigData_dev/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd EmBigData_dev/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir v_embigdata_dev_node0003_catalog
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba v_embigdata_dev_node0003_catalog/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd v_embigdata_dev_node0003_catalog/
正常节点向其传输spread.conf 文件
[dbadmin@n3 v_embigdata_dev_node0003_catalog]$ scp spread.conf 172.16.57.26:/opt/EmBigData_dev/v_embigdata_dev_node0003_catalog/
spread.conf 100% 403 0.4KB/s 00:00
问题节点查看spread.conf
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ ll
total 4
-rw-r--r-- 1 dbadmin verticadba 403 Jul 12 09:24 spread.conf
尝试重启问题节点,多次执行以下步骤,执行成功
Nodes UP: v_embigdata_dev_node0002, v_embigdata_dev_node0003
Nodes DOWN: v_embigdata_dev_node0001 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] yes
总结一下:
这里可以看到我们在节点1里面,却配置了节点3的v_embigdata_dev_node0003_catalog,节点1依然启动成功!!
这里发现vertica有一个识别节点的spread.conf文件,这个文件在每个节点都一样的
可以看看问题节点在重启成功后的文件结构
[dbadmin@n1 EmBigData_dev]$ ll
total 36
-rw-r--r-- 1 dbadmin verticadba 7164 Jul 12 09:30 dbLog
drwx------ 9 dbadmin verticadba 4096 Jul 12 09:40 v_embigdata_dev_node0001_catalog
drwxrwx--- 502 dbadmin verticadba 20480 Jul 12 09:31 v_embigdata_dev_node0001_data
drwxr-xr-x 2 dbadmin verticadba 4096 Jul 12 09:24 v_embigdata_dev_node0003_catalog
这里v_embigdata_dev_node0001_catalog和v_embigdata_dev_node0001_data全部重新生成了
查看日志,了解vertica恢复步骤
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Txn] <INFO> Looking for catalog at: /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/Catalog
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Init] <INFO> Catalog not loaded from /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog, probably being bootstrapped or empty
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> Spread domain socket will be opened in directory /opt/vertica/spread/tmp
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.conf'
2016-07-12 09:24:48.427 unknown:0x7f7352307720 [Comms] <INFO> forked spread pid=33233, wrote pidfile /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.pid
可以看到spread.conf在集群之间传递
...............
...............
...............
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [VMPI] <INFO> We are invited to join the cluster
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [Init] <INFO> Startup [Waiting for Cluster Invite] Invited
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Txn] <INFO> Begin Txn: fff0000000000bca 'Installing New Catalog'
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Catalog] <INFO> installNewCatalog: Received new catalog, replacing current TRANSACTION-fff0000000000bca catalog (old version=0, new version 0x35f2d)
...............
...............
...............
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Commit Complete: Txn: a0000000000001 at epoch 0xa93c
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Joining DB group with catalog version 220973
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Comms] <INFO> joinDBGroup: Node v_embigdata_dev_node0001 (#r0-10#N172016057026) joining cluster for DB EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : INITIALIZING
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Recover] <INFO> State change for node v_embigdata_dev_node0001: INITIALIZING; catalog 220973
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> Saw membership message 4352 (0x1100) on V:EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> DB Group changed
...............
...............
...............
2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 <LOG> @v_embigdata_dev_node0001: 00000/3298: Event Posted: Event Code:6 Event Id:1 Event Severity: Informational [6] PostedTimestamp: 2016-07-12 09:25:24.688094 ExpirationTimestamp: 2084-07-30 12:39:31.688094 EventCodeDescription: Node State Change ProblemDescription: Changing node v_embigdata_dev_node0001 startup state to RECOVERING DatabaseName: EmBigData_dev Hostname: n1
2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 [Recover] <INFO> Changing node v_embigdata_dev_node0001 startup state from INITIALIZING to RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
重启这个节点报错
"{0}. Error was: {1}.".format(repr(msg), e))
ATReceiveFailure_Init: Problem json decoding message '{"status": null, "content": {"special_environment": null}, "error_type": null, "error_message": null, "exec_stack": null}'. Error was: None is not a valid status.
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir EmBigData_dev
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba EmBigData_dev/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd EmBigData_dev/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo mkdir v_embigdata_dev_node0003_catalog
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ sudo chown -R dbadmin:verticadba v_embigdata_dev_node0003_catalog/
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ cd v_embigdata_dev_node0003_catalog/
正常节点向其传输spread.conf 文件
[dbadmin@n3 v_embigdata_dev_node0003_catalog]$ scp spread.conf 172.16.57.26:/opt/EmBigData_dev/v_embigdata_dev_node0003_catalog/
spread.conf 100% 403 0.4KB/s 00:00
问题节点查看spread.conf
[dbadmin@n1 v_embigdata_dev_node0003_catalog]$ ll
total 4
-rw-r--r-- 1 dbadmin verticadba 403 Jul 12 09:24 spread.conf
尝试重启问题节点,多次执行以下步骤,执行成功
Nodes UP: v_embigdata_dev_node0002, v_embigdata_dev_node0003
Nodes DOWN: v_embigdata_dev_node0001 (may be still initializing).
It is suggested that you continue waiting.
Do you want to continue waiting? (yes/no) [yes] yes
总结一下:
这里可以看到我们在节点1里面,却配置了节点3的v_embigdata_dev_node0003_catalog,节点1依然启动成功!!
这里发现vertica有一个识别节点的spread.conf文件,这个文件在每个节点都一样的
可以看看问题节点在重启成功后的文件结构
[dbadmin@n1 EmBigData_dev]$ ll
total 36
-rw-r--r-- 1 dbadmin verticadba 7164 Jul 12 09:30 dbLog
drwx------ 9 dbadmin verticadba 4096 Jul 12 09:40 v_embigdata_dev_node0001_catalog
drwxrwx--- 502 dbadmin verticadba 20480 Jul 12 09:31 v_embigdata_dev_node0001_data
drwxr-xr-x 2 dbadmin verticadba 4096 Jul 12 09:24 v_embigdata_dev_node0003_catalog
这里v_embigdata_dev_node0001_catalog和v_embigdata_dev_node0001_data全部重新生成了
查看日志,了解vertica恢复步骤
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Txn] <INFO> Looking for catalog at: /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/Catalog
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Init] <INFO> Catalog not loaded from /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog, probably being bootstrapped or empty
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> Spread domain socket will be opened in directory /opt/vertica/spread/tmp
2016-07-12 09:24:48.426 unknown:0x7f7352307720 [Comms] <INFO> About to launch spread with '/opt/vertica/spread/sbin/spread -c /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.conf'
2016-07-12 09:24:48.427 unknown:0x7f7352307720 [Comms] <INFO> forked spread pid=33233, wrote pidfile /opt/EmBigData_dev/v_embigdata_dev_node0001_catalog/spread.pid
可以看到spread.conf在集群之间传递
...............
...............
...............
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [VMPI] <INFO> We are invited to join the cluster
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30 [Init] <INFO> Startup [Waiting for Cluster Invite] Invited
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Txn] <INFO> Begin Txn: fff0000000000bca 'Installing New Catalog'
2016-07-12 09:25:00.021 Spread Client:0x7c4eb30-fff0000000000bca [Catalog] <INFO> installNewCatalog: Received new catalog, replacing current TRANSACTION-fff0000000000bca catalog (old version=0, new version 0x35f2d)
...............
...............
...............
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Commit Complete: Txn: a0000000000001 at epoch 0xa93c
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Txn] <INFO> Joining DB group with catalog version 220973
2016-07-12 09:25:24.026 Spread Client:0x7c4eb30 [Comms] <INFO> joinDBGroup: Node v_embigdata_dev_node0001 (#r0-10#N172016057026) joining cluster for DB EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : INITIALIZING
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Recover] <INFO> State change for node v_embigdata_dev_node0001: INITIALIZING; catalog 220973
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> Saw membership message 4352 (0x1100) on V:EmBigData_dev
2016-07-12 09:25:24.037 Spread Client:0x7c4eb30 [Comms] <INFO> DB Group changed
...............
...............
...............
2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 <LOG> @v_embigdata_dev_node0001: 00000/3298: Event Posted: Event Code:6 Event Id:1 Event Severity: Informational [6] PostedTimestamp: 2016-07-12 09:25:24.688094 ExpirationTimestamp: 2084-07-30 12:39:31.688094 EventCodeDescription: Node State Change ProblemDescription: Changing node v_embigdata_dev_node0001 startup state to RECOVERING DatabaseName: EmBigData_dev Hostname: n1
2016-07-12 09:25:24.688 DistCall Dispatch:0x7f72a07b6bb0 [Recover] <INFO> Changing node v_embigdata_dev_node0001 startup state from INITIALIZING to RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> nodeToState map:
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0001 : RECOVERING
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0002 : UP
2016-07-12 09:25:24.688 Spread Client:0x7c4eb30 [Comms] <INFO> v_embigdata_dev_node0003 : UP
相关文章推荐
- Mysql的row_format
- 进程的虚拟地址空间分配概述
- 经典排序算法 - 冒泡排序Bubble sort
- 排序算法——归并排序
- 欧拉计划第4题
- 极光推送iOS SDK教程
- 天下所有的事,都是为了利益,都是按利益逻辑规律在运行,发生的一切事情都可以用利益逻辑来解释。
- 极光推送iOS SDK教程
- C++中的引用
- 重装jdk1.8出现如下错误Error opening registry key'software\Javasoft\Java Runtime Environment'
- Intellij Idea 下如何像eclipse的自动缩进来保持代码整洁性
- java学习总结(16.07.12)java的内部类
- mysql中like及regexp查询反斜杠
- 职场六忌
- Android中Context详解 ---- 你所不知道的Context (转载)
- delphi中使用mediaplayer控件播放音乐
- LA-2678 (尺取法)
- 正则匹配字符串中的电话号码
- Uva 11374 最短路 好题
- ASP.NET MVC 基于角色的权限控制系统的示例教程